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PREFACE 

This  book  is  an  attempt  to  bridge,  in  part,  a  gap  be- 
tween theory  and  practice  in  educational  tests  and  measure- 
ments. The  leaders  in  the  movement  for  a  quantitative 
study  of  educational  problems  have  advanced  so  far  into 
the  theory  and  more  complex  practical  phases  of  the  work 
that  a  very  large  percentage  of  the  teachers  and  students 
have  considerable  difficulty  in  following  them.  Most  of 
the  books  on  the  subject  have  either  been  more  or  less 
technical,  pre-supposing  considerable  training  on  the  part 
of  the  readers  in  the  field,  or  they  have  been  largely 
manuals  of  directions  for  giving  the  tests  and  scoring  the 
papers,  with  little  reference  to  the  theory  and  to  the  prob- 
lems which  have  confronted  those  attempting  to  measure 
educational  processes  and  products.  This  book  deals  with 
the  processes  and  problems  in  a  somewhat  evolutionary 
way  so  that  the  teachers  and  students  may  see  the  order 
in  which  the  problems  have  arisen  and  the  attempted  solu- 
tions of  them.  A  mere  manual  of  directions  for  giving 
tests  and  scoring  the  papers  will  not  develop  a  professional 
spirit  among  teachers  in  this  field.  They  must  under- 
stand the  fundamental  principles,  or  the  work  becomes 
purely  mechanical  and  non-professional.  It  has  been  the 
aim  of  the  author  to  present  the  fundamental  principles 
in  non-technical  language,  as  far  as  it  is  possible  to  do  so, 
and  to  confine  the  statistical  treatment  of  the  data  almost 
entirely  to  simple  operations  in  arithmetic. 

The  author  has  drawn  freely  on  the  works  of  Drs. 
Edward  L.  Thorndike,  Leonard  P.  Ayres,  Harold  0.  Rugg, 
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W.  C.  MeCall,  Lewis  M.  Terman,  and  others  to  which  he 
desires  to  give  due  recognition. 

The  author  desires  also  to  thank  Prof.  H.  R.  Douglass 
and  Dr.  B.  W,  DeBusk  for  reading  parts  of  the  manuscript 
and  especially  to  thank  Professor  W.  C.  Mclnnis  of  the 
Eugene  High  School  for  the  careful  reading  of  the  manu- 
script and  the  helpful  suggestions  made. 

C.  A.  G. 
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CHAPTER  I 

INTRODUCTION 

The  crying  need  of  the  hour  is  for  efficiency.  The  fact 
that  there  are  few  universal  standards  for  efficiency  does 
not  dampen  the  ardor  of  those  demanding  it.  The  commer- 
cial world  will  tolerate  almost  anything  but  what  it  con- 
siders inefficiency;  the  efficiency,  or  inefficiency,  of  our 
railway  systems  constitute  no  small  part  of  the  conversation 
of  the  shippers  and  the  traveling  public.  In  politics  we 
talk  about  an  efficient  administration  or  an  efficient  public 
servant;  in  education,  the  public  wants  an  efficient  school 
system. 

The  subject  of  efficiency  has  occupied  the  center  of  the 
stage  at  every  noted  gathering  of  educators  in  the  United 
States  for  the  last  decade.  A  belief  is  current  that  the 
education  of  the  day  is  far  short  of  its  optimum  efficiency ; 
that  there  is  something  wrong  with  the  public  schools. 
Because  of  this  criticism  the  elementary  schools  have  been 
compelled  to  face,  not,  as  of  old,  a  criticism  as  to  their 
rights  to  existence  or  their  right  to  receive  public  support, 
but  an  adverse  criticism  as  to  their  efficiency  as  measured 
by  their  human  product. 

The  educator  can  withstand  almost  any  kind  of  criticism 
but  that  which  decries  the  efficiency  of  his  school.     Effi- 
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ciency  is  one  of  the  chief  goals  for  which  he  is  striving. 
It  is  the  balance  in  which  his  success  or  failure  must  be 
weighed.  Books,  consisting  of  hundreds  of  pages  each, 
have,  as  their  chief  contents,  data  dealing  with  the  efficient 
administration  of  the  various  phases  of  school  systems. 

Recently  there  has  come  from  the  press  a  five-hundred 
page  report  on  the  Gary  public  schools  made  by  the  General 
Education  Board.  The  makers  of  this  survey  say  in  the 
opening  pages  of  their  report:^  "The  public  is  interested 
in  knowing  whether  the  Gary  schools  are  efficient  or  in- 
efficient as  now  conducted.  The  public  is  also  interested 
in  knowing  whether  such  a  plan  is  sound  or  unsound.  The 
present  study  tries  to  do  justice  to  both  points." 

The  National  Society  for  the  Study  of  Education  con- 
sidered efficiency  of  such  importance  that  it  devoted  Part 
II  of  the  Fourteenth  Yearbook  to  the  "Methods  of  Measur- 
ing Teachers'  Efficiency."  Scores  of  other  books  on  educa- 
tion make  efficiency  no  small  part  of  their  content. 

The  efficiency  programme  is  not  confined  to  the  public 
schools.  Education  is  but  one  phase  of  the  newer  social 
economy.  When  one  runs  through  the  card  index  of  the 
library  or  consults  the  Reader's  Guide,  he  is  surprised  to 
see  the  number  of  books  and  magazine  articles  dealing  with 
the  subject  of  efficiency  in  all  lines  of  work.  The  states 
of  the  Union  appoint  commissions  on  efficiency  and  econ- 
omy. Economy  and  efficiency  in  government  service  be- 
comes a  part  of  the  report  of  the  President  of  the  United 
States  to  Congress.  Books  on  efficiency  are  written  under 
such  titles  as  the  following:  "Modern  Methods  in  the  Office, 
How  to  Cut  Corners  and  Save  Money";  "Economics  of 
Efficiency";  "Psychology  for  Business  Efficiency";  "Effi- 
ciency as  a  Basis  for  Operation  and  Wages";  "Motion 


Stuart  A.  Courtis,  The  Gary  Public  Schools,  Measurement  of  Class- 
room Products  (General  Education  Board,  61  Broadway,  New  York 
City),  Introduction,  p.  8. 
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Study,  A  Method  for  Increasing  the  Efficiency  of  the 
Workman";  "Fatigue  and  Efficiency";  "The  Price  of 
Inefficiency";  "The  Human  Machine  and  Industrial  Effi- 
ciency"; "A  Symposium  on  Scientific  Management  and 
Efficiency  in  College  Administration";  and  hundreds  of 
other  books  and  articles  dealing  with  this  subject. 

The  spirit  of  the  new  education  is  that  of  social  efficiency. 
Subjects  will  no  longer  find  an  excuse  for  being  in  the 
course  of  study  in  the  words  of  the  old  song,  "We're  here 
because  we're  here,"  but  each  subject  must  prove  that  it 
is  one  of  the  best  things  that  can  be  taught  in  the  few 
short  years  of  a  child's  school  life. 

The  mechanic  has  a  perfectly  definite  way  of  measuring 
efficiency.  It  is  the  ratio  that  the  useful  work  done  hy  a 
machine  hears  to  the  total  work  done  on  it.  No  machine 
is  one  hundred  per  cent  efficient,  else  a  perpetual-motion 
machine  would  be  possible.  There  is  always  a  waste.  The 
most  efficient  machine  is  the  one  which  does  the  maximum 
amount  of  work  with  a  minimum  of  waste. 

The  efficiency  of  a  school  system  is  likewise  a  ratio. 
It  is  the  ratio  that  the  time,  energy,  and  money  spent 
bear  to  the  finished  product.  The  finished  product  in  the 
case  of  the  machine,  that  is,  the  work  done,  is  measured  in 
some  convenient  unit  of  work,  as  the  foot-pound,  the 
foot-poundal,  the  erg,  or  the  kilogrammeter.  Unfortu- 
nately, the  products  of  a  school  system  are  measurable 
in  no  such  definite  terms.  Descriptive  adjectives  with 
vague  meanings  usually  constitute  the  best  and  only 
measuring  sticks  we  have. 

Efficient  American  Citizens. — It  is  said  that  one  of  the 
chief  businesses  of  the  American  schools  is  to  prepare  in- 
dividuals for  efficient  membership  in  American  society. 
There  are  at  least  three  qualifications  for  such  member- 
ship: (1)  An  ability  to  execute  effectively  the  formal  and 
informal  duties  of  citizenship  and  carry  the  burdens  of 
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political  responsibility;  (2)  an  ability  to  labor  and  produce 
so  effectively  that  one  is  able  to  carry  his  own  economic 
load;  (3)  an  ability  to  utilize  one's  leisure  time  and  to 
function  in  an  individual  capacity  without  infringing  on 
the  rights  of  others  or  of  society  at  large.- 

Few  would  disagree  with  these  qualifications.  It  is  only 
when  we  attempt  to  particularize  and  say  just  what  and 
how  mucli  an  individual  must  have  to  execute  the  formal 
and  informal  duties  of  citizenship,  carry  the  economic  load 
and  perform  the  other  duties  mentioned  above,  that  dif- 
ferences of  opinion  arise. 

All  agree  that  the  schools  should  turn  out  boys  and 
girls  efficient  in  arithmetic,  language,  civics,  good  character, 
and  all  the  rest,  but  there  is  little  agreement  as  to  just 
what  one  must  do  to  become  efficient  in  these  things. 
Standards  of  efficiency  and  methods  of  reaching  them  have 
not  always  been  based  on  scientific  facts  which  might  be 
verified  in  the  laboratory.  Before  one  can  have  an  intel- 
ligent conception  of  efficiency,  there  must  be  some  way 
to  evaluate,  to  measure  it.  In  fact,  one  cannot  conceive  of 
efficiency  except  in  terms  of  quantity. 

It  is  the  purpose  of  this  treatise  to  set  forth  in  a  non- 
technical way,  as  far  as  possible,  some  of  the  benefits  to 
be  derived  and  the  difficulties  to  be  overcome  in  the  use 
of  measurements  in  the  field  of  education. 

Progress  Conditioned  by  Ability  to  Measure. — Progress 
in  civilization  has  depended  very  largely  on  our  ability  to 
measure.  James  Watt,  for  instance,  could  not  make  a 
steam  engine  until  men  were  able  to  make  measurements 
so  exact  that  a  cylinder  and  a  piston  could  be  built  that 
were  steam-tight  and  yet  allowed  free  play.  The  automo- 
bile of  to-day  had  to  wait  until  men  could  measure  the 
five-thousandth  part  of  an  inch,  and  the  ship's  ehronom- 

"  Alexander  Inglis,  Principles  of  Secondary  Education,  p.  342. 
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eter  until  they  could  measure  distance  five  times  more 
minute  than  that. 

New  measures  are  constantly  coming  into  use.  They  are 
no  longer  restricted  to  length,  area,  weight,  and  volume. 
New  commodities  such  as  electric  currents,  light,  heat,  and 
refrigeration  are  measured  and  sold  on  the  markets  to-day 
just  as  other  commodities  are  sold.  Progress  in  these  lines 
has  been  conditioned  by  our  ability  to  devise  new  measur- 
ing sticks  to  measure  them. 

The  physical  sciences  have  taken  every  precaution  to 
standardize  their  units  of  measurement.  At  the  United 
States  Bureau  of  Standards  in  "Washington  one  finds 
instruments  that  will  weigh  with  accuracy  down  to 
i of   an   ounce;   high-temperature   thermometers 

50.000.000  '  ° 

that  will  register  accurately  1,000  degrees  above  zero 
Fahrenheit  and  pentane  thermometers  registering  300  de- 
grees below  zero  Fahrenheit ;  saccharimeters  measuring  the 
impurities  in  sugar  by  the  twist  of  light  waves  which  are 
allowed  to  pass  through  a  sugar  solution;  Emery  testing 
machines  having  a  compressing  power  of  2,300,000  pounds 
and  a  pulling  power  of  1,150,000  pounds  per  square  inch.^ 
It  is  no  longer  a  matter  of  opinion  as  to  the  strength 
of  a  steel  girder,  for  instance,  or  the  resistive  power  of  a 
steel  rail.  The  scientist  now  speaks  with  authority  along 
these  lines.  Natural  science  has  made  its  gains  by  sub- 
stituting facts  for  opinions,  and  units  of  measure  for  mere 
guess-work.  Hershel  says:  ''Numerical  precision  is  the 
soul  of  science." 

When  the  teaching  profession  enters  the  stage  where  its 
data  and  conclusions  can  be  presented  in  quantitative  as 
well  as  qualitative  terms,  it  has  entered  upon  a  most  im- 
portant stage  of  development. 

Men  in  all  businesses  and  professions  are  becoming  quan- 

s  "A  Wonderland  of  Science,"  National  Geographic  Magazine,  Vol.  27, 
pp.  153-169. 
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titative  thinkers.  Education  is  not  an  exception.  Edu- 
cators are  seeking  to  verify  and,  in  some  cases,  to  refute 
the  established  beliefs  concerning  the  effects  of  educational 
forces  upon  human  nature.  Dogmatic  and  authoritative 
control  of  education  is  going  the  way  of  all  mere  authority 
and  dogma  in  human  affairs.  The  "popular  guessing  con- 
tests" that  have  been  going  on  in  education  as  to  which 
processes  are  the  best,  and  what  products  are  obtained  from 
them,  are  giving  way  to  experimentally  determined  facts. 
It  means  that  education  is  emerging  from  among  the  voca- 
tions and  taking  its  place  among  the  professions. 

In  the  evolution  of  our  civilization,  one  form  of  human 
activity  after  another  has  been  subjected  to  exact  measure- 
ment and  made  to  yield  its  quota  of  natural  law.  This 
movement  in  education  is  but  a  part  of  a  larger  one  which 
in  recent  years  has  extended  the  scope  of  the  applied 
sciences  and  utilized  scientific  principles  in  the  improve- 
ment of  many  lines  of  human  endeavor. 

The  American  people  are  an  extremelj'  practical  folk. 
Contemporaneous  American  life  manifests  an  abiding  faith 
in  practicality  and  efficiency.  The  American  people  believe 
in  a  creed  or  philosophy  of  life  that  really  "works"  when 
applied  to  practical  situations.  They  are  pragmatists. 
They  are  looking  away  from  first  things,  principles,  cate- 
gories, supposed  necessities,  and  looking  towards  last 
things,  fruits,  consequences,  facts.*  We  have  talked  much 
of  the  aims  of  education.  Now  we  are  asking  for  results. 
We  are  going  to  judge  the  efficiency  of  a  school  system  in 
terms  of  the  results  and  not  in  terms  of  its  aims. 

For  many  years  we  have  measured,  in  a  way,  the  effi- 
ciency of  higher  education  by  the  definition  of  what  con- 
stitutes an  institution  of  higher  learning,  which  we  find 
in  the  laws  of  the  various  states,  in  the  definitions  of  the 

*  William  James,  Pragmatism,  p.  54. 
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U.  S.  Bureau  of  Education,  the  Carnegie  Foundation  for 
the  Advancement  of  Teaching,  and  many  voluntary  associa- 
tions that  deal  with  this  phase  of  education.  These  defini- 
tions cover  such  topics  as:  entrance  requirements,  endow- 
ments, income,  curricula,  the  school  plant,  time  allotments, 
and  the  qualifications  of  the  teaching  staff.  Such  standards 
measure  the  equipment  and  materials  for  instruction  but 
not  the  product. 

Much  emphasis  has  been  laid  on  the  statement  that  the 
teacher's  power  is  exerted  in  one  generation,  while  that 
of  his  students  is  exerted  in  another;  that  for  him  who 
teaches  there  is  no  final  measure  of  the  day's  work;  that 
it  is  beyond  his  vision  in  time  and  place.  The  next  genera- 
tion may  attempt  a  full  estimate  of  his  labor,  but  he  him- 
self cannot. 

Teachers  have  accepted  these  general  statements  more 
literally  than  the  facts  warrant.  It  is  true  that  the  full 
power  of  one  mind  over  another  cannot  be  measured. 
Nevertheless,  if  the  intricate  ministry  of  teaching  is  to 
become  anything  more  than  a  crude  art,  where  blind  faith, 
subtle  intuition,  and  crude  methods  of  trial  and  error,  are 
the  rules  of  procedure,  standards  must  be  set  up  and  meas- 
urements must  be  made,  not  in  the  next  generation,  but 
in  the  midst  of  the  educational  processes  now  going  on. 

The  ideal  condition  would  be  for  the  teacher  to  see  the 
man  of  his  moulding  walking  about  full-grown  among  his 
neighbors  and  performing  his  daily  duties  and  graces.  No 
other  measure  of  one 's  work  equals  the  sight  of  the  product 
put  to  its  full  uses.  It  is  the  best  corrective  of  one's 
blunders,  the  quickest  encouragement  to  efficient  action. 
Unfortunately,  this  satisfaction  is  reserved  for  the  lesser 
craftsmen  of  life.  We  shall  have  to  be  content  with  condi- 
tions less  favorable.  This  does  not  mean,  however,  that 
because  we  cannot  measure  our  educational  products  com- 
pletely, we  cannot  measure  them  at  all.     Some  phases  of 
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our  educational  processes  and  products  are  quite  amenable 
to  measurements.  The  hazard  is  too  great  to  wait  for  the 
next  generation  to  evaluate  our  products.  Definite,  attain- 
able goals,  therefore,  are  being  set  up.  Teachers  are  being 
required  to  find  the  situation,  or  series  of  situations,  that 
will  produce  the  desired  results.  When  a  child  is  put 
through  the  education  situation,  we  want  to  know  imme- 
diately the  quality  and  amount  of  change  made.  If  the 
results  are  not  satisfactory,  we  put  him  through  the  situa- 
tion again  or  make  a  new  situation  that  will  produce  the 
change  desired. 

Working  Hypotheses  for  Acquiring  Efficiency. — No  at- 
tempt will  be  made  here  to  define  an  efficient  school  system. 
Enough  has  been  said  about  efficiency  to  give  the  general 
direction  of  the  goal  toward  which  we  are  striving.  No 
attempt  will  be  made  to  discuss  educational  creeds  unless 
measurements  become  a  factor  in  determining  them.  The 
chief  problem,  rather,  is  this;  having  decided  upon  a  de- 
sired result,  how  may  measurements  be  utilized  to  bring 
about  that  result  with  the  minimum  expenditure  of  time, 
money,  and  energy?  Our  hazy  ideas  about  an  efficient 
school  system  will  be  clarified  only  by  approaching  a  little 
closer  to  the  goal  sought.  While  we  do  not  know  exactly 
what  an  efficient  school  system  is,  we  know  the  road  that 
leads  to  it. 

Whatever  the  definition  of  an  efficient  school  system  may 
be,  the  following  propositions  may  be  readily  accepted  as 
working  hypotheses: 

1.  Of  two  school  systems  attempting  to  make  efficient 
^American  citizens,  that  system  is  the  better  ivhicli  produces 
the  desired  finished  product  with  the  minimnnn  waste  in 
time,  energy,  and  money  expended. 

2.  More  accurate  measures  than  ive  now  have  of  those 
things  that  are  measurahle  u'ill  reduce  the  waste  in  educa- 
tion and  make  our  schools  more  efficient. 
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It  is  with  the  second  proposition  that  we  are  primsirily 
concerned.  An  attempt  will  be  made  to  show  how  the 
efficiency  of  the  schools  may  be  increased  by  a  more  ac- 
curate measurement  of  certain  materials,  processes,  and 
products.  Three  activities  are  necessary  to  do  this.  It  is 
necessary;  (1)  to  persuade  the  teachers  that  measurements 
are  beneficial;  (2)  to  make  the  measurements;  (3)  to  use 
the  results  of  the  measurements  as  a  means  for  doing  better. 

Knowledge  is  the  first  attribute  of  the  man  of  science. 
It  is  knowledge,  born  of  the  travail  of  thought  and  ex- 
perience, that  differentiates  the  physician  from  the  quack ; 
the  la^^^'er  from  the  shyster;  the  statesman  from  the 
demagogue;  it  is  likewise  the  first  indispensable  element 
of  educational  sanity  and  progress.  Education  has  hitherto 
rested  on  the  foundation  of  custom.  It  must  hereafter  rest 
on  a  basis  of  scientific  knowledge.  Facts  alone  will  enable 
the  educator  to  keep  his  balance  in  the  midst  of  educational 
upheavals.  Because  of  the  lack  of  scientific  information, 
many  theories  not  justified  by  systematic  observation  have 
been  current.  As  a  result,  much  of  the  time  and  energy 
of  teachers  have  been  spent  to  a  great  disadvantage;  con- 
fusion has  been  produced ;  and  the  teaching  profession  has, 
at  times,  been  greatly  retarded. 

In  spite  of  the  fact  that  the  educational  literature  is 
replete  with  data  and  arguments  why  teachers  should  use 
measurements  to  eliminate  waste  and  do  more  effective 
teaching,  nevertheless  the  majority  of  teachers  are  not 
availing  themselves  of  this  opportunity.  Tradition  and 
habit  are  still  too  strong  to  allow  any  radical  changes.  Of 
those  teachers  who  have  used  these  newly  acquired  tools, 
many  have  treated  them  as  ''educational  curiosities"  rather 
than  as  a  means  for  more  efficient  work.  It  is  because  of 
these  conditions  that  considerable  space  is  given  in  the  next 
chapter  to  reasons  for  the  need  of  measurements. 


CHAPTER  II 

EFFICIENCY  THROUGH   MEASUREMENTS 

In  this  chapter  we  shall  discuss  some  reasons  why  meas- 
urements should  be  made,  hoping  thereby  to  persuade  a 
greater  number  of  teachers  that  measurements  are  neces- 
sary for  the  most  effective  teaching.  The  subject  will  be 
discussed  under  the  following  headings,  all  of  which  have 
to  do  with  measurements  either  directly  or  indirectly.  The 
general  thesis  is  that  the  schools  may  be  made  more  efficient 
by:  (1)  statement  of  and  adherence  to  definite  aims;  (2) 
the  elimination  of  waste;  (3)  placing  education  on  a  fact 
basis  through  educational  measurements;  (4)  the  establish- 
ment of  definite  standards;  (5)  profiting  by  the  experience 
of  those  in  other  occupations  and  professions;  (6)  meeting 
some  objections  to  educational  tests  and  measurements; 
(7)  keeping  an  accurate  record  of  all  methods  tried  and 
progress  made;  (8)  the  cultivation  of  the  confidence  and 
the  utilization  of  the  support  of  the  public. 

I.  Statement  op  and  Adherence  to  Definite  Aims 

Professor  Bobbitt  has  tersely  stated  the  situation  in  re- 
gard  to  our  aims  jn  education.  He  says  :^  * '  We  have  aimed 
at  a  vague  culture,  an  ill-defined  discipline,  a  nebulous, 
harmonious  development  of  the  individual,  an  indefinite 
moral  character-building,  an  unparticularized  social  effi- 
ciency, or,  often  enough,  nothing  more  than  escape  from 
a  life  of  work." 


1  John  Franklin  Bobbitt,  The  Curriculum  (Houghton  Mifliin  Co.), 
p.  41. 
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It  is  evident  that,  as  long  as  we  deal  with  such  rainbow 
generalities,  we  can  never  hope  to  get  definite  results.  Our 
results  never  will  be  more  definite  than  our  aims.  One 
of  the  first  things  that  the  establishment  of  standard  norms 
will  do  for  the  science  of  education  will  be  to  make  definite 
and  specific  the  aims  of  teaching. 

We  have  a  bountiful  supply  of  general  aims  of  more  or 
less  value.  Education  is  a  training  for  citizenship;  it  is 
to  prepare  for  life;  it  is  to  develop  power;  it  is  to  make 
cultured  men  and  women;  it  is  to  prepare  for  the  voca- 
tions; it  is  to  develop  a  moral  individual,  endowed  with 
the  power  of  independent  thought,  with  the  ability  to  earn 
an  honest  livelihood,  with  culture,  refinement,  and  a  broad 
and  intelligent  interest  in  human  affairs.  The  aims  of 
education  are  sometimes  stated  in  terms  of  culture  and 
sometimes  in  terms  of  productive  efficiency.  The  tendency 
now  most  in  favor  among  progressive  school  men  is  that 
emphasized  by  the  very  principle  for  which  social  economy 
stands;  that  education  is  to  be  tested  neither  by  culture 
in  the  abstract,  nor  utility  in  the  concrete,  but  by  the 
extent  to  which  any  slight  knowledge,  any  manual  dex- 
terity, and  any  useful  ''tricks"  of  spelling,  counting,  read- 
ing, etc.,  are  assimilated  into  the  organic  complex  of  per- 
sonal character. 

Value  of  General  Aims  Limited. — It  is  easy  to  agree 
with  general  aims  but  they  are  of  limited  value  in  the 
school  room.  Where  one  is  confronted  with  a  particular 
boy,  with  a  particular  task,  as  adding  a  ten-figure  column, 
such  aims  as  preparation  for  life,  citizenship,  culture, 
power,  etc.,  seem  remote  and  ineffectual.  The  controlling 
purpose  of  education  must  be  sufficiently  particularized 
that  we  may  know  when  any  part  or  unit  of  the  aim  has 
been  accomplished.  An  objective  such  as  ''good  citizen- 
ship," for  instance,  must  first  be  reduced  to  smaller  units 
such  as  being  a  good  neighbor,  a  good  parent,  a  wise  and 
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conscientious  voter,  etc.  These  units  must,  in  turn,  become 
material  for  further  analysis  until  we  finally  get  down  to 
definite  ** working  units."  The  first  thing  we  have  to 
settle  is  what  we  want  our  schools  to  produce.  Our  courses 
of  study  at  the  present  time  easily  could  be  fiUed  with  many 
objective  statements  of  aim  which  would  be  so  specific  that 
there  would  be  no  question  whether  or  not  they  were  car- 
ried out  in  practice. 

When  we  are  challenged  to  justify  the  increasing  cost 
of  education  and  the  multiplication  of  courses  necessary 
to  meet  the  demands  of  complex  life,  we  apply  the  prag- 
matic test.  For  instance,  what  practical  difference  would 
it  make  if  this  were  incorporated  into  the  course,  and  if 
that  were  left  out?  The  progressive  teacher  is  constantly 
confronted  with  the  decision  of  accepting  an  educational 
fad  for  a  scientific  fact.  Measurements  offer  a  means  by 
which  she  can  test  out  the  relative  values  of  aims  by  tracing 
them  to  their  consequences.  Tests  and  measurements,  there- 
fore, become  an  important  guide  in  evaluating  the  pro- 
posals of  educational  changes. 

A  Danger  to  Be  Avoided. — There  are  many  dangers 
incident  to  the  use  of  measurements  which  will  be  discussed 
later  in  this  chapter.  Only  one  will  be  mentioned  here, 
namely,  the  danger  of  making  the  real  aims  and  ends  of 
education  subservient  to  measurements,  instead  of  using 
measurements  to  get  the  desired  results.  The  good  physi- 
cian is  interested  primarily,  not  in  the  medicine,  nor  in 
the  methods  of  administering  it,  but  in  the  effect  it  will 
have  on  the  patient,  and  we  should  regard  him  as  a  poor 
physician  who  thinks  more  of  the  medicine  and  the  way 
he  gives  it  than  he  does  of  the  patient's  progi-ess  toward 
recovery. 

The  Technical  Scientist  and  Educational  Creeds. — It  is 
not  the  business  of  the  technical  scientist  to  tell  us  what 
shall  be  the  aims  of  education.    That  is  a  matter  of  creed. 
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Those  belonging  to  a  creed  usually  set  up  an  ideal  toward 
which  they  strive.  For  instance,  there  are  those  in  educa- 
tion who  look  primarily  to  the  subjective  results;  the  en- 
riched mind;  refined  sensibilities,  culture,  discipline,  and 
the  like.  Those  belonging  to  this  creed  say  that  education 
is  the  ability  to  live  rather  than  the  practical  ability  to 
produce. 

Another  creed  has  as  its  aim  efficient  practical  action  in 
a  practical  world.  The  individual  is  educated  who  can 
perform  efSciently  the  labors  of  his  calling;  who  can  co- 
operate effectively  with  his  fellow  in  social  civic  affairs. 
This  creed  would  have  science  studies  in  order  that  the 
facts  may  be  put  to  work  by  the  mechanic  in  his  shop, 
and  the  farmer  on  his  farm.  It  would  make  a  survey  of 
the  science  needs  of  the  community  and  teach  only  those 
things  which  make  a  direct  contribution  to  these  needs.^ 
The  scientist  in  education  has  practically  nothing  to  do 
with  the  formation  of  these  creeds.  He  simply  says  that 
you  must  make  use  of  this  means,  if  you  wish  to  reach 
this  or  that  particular  end.  But  no  technical  science  can 
decide  within  its  limits  whether  the  end  itself  is  really  a 
desirable  one.  The  technical  specialist  knows  how  he  ought 
to  build  a  bridge,  or  how  he  ought  to  dig  a  tunnel,  pre- 
supposing that  the  bridge  and  the  tunnel  are  desirable. 
Whether  they  are  desirable  or  not  is  a  question  that  does 
not  concern  him.  He  simply  says  that  if  you  wish  this 
end,  then  you  must  proceed  this  way. 

When  aims  are  definite,  measurements  are  made  on  the 
materials  of  instruction  so  as  to  meet  the  specifications  set 
forth  in  the  aim.  Everything  that  does  not  contribute 
to  the  aim  is  then  discarded  as  material  that  is  super- 
fluous. 

2  lUd.,  pp.  3-4. 
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II.  The  Elimination  op  Waste 

Limited  Quantities  of  Things  Make  Measurements 
Necessary. — If  everything  with  which  human  activity  is 
in  any  way  concerned  w^ere  unlimited,  there  would  be  no 
need  for  measurements.  Even  if  things  were  not  actually 
unlimited,  if  there  was  always  enough  of  any  one  of  them 
to  be  had  with  little  or  no  expenditure  of  energy,  we 
would  concern  ourselves  but  little  about  quantity,  and 
hence  measurements  would  be  practically  unnecessary. 
The  very  fact  that  most  of  the  things  with  which  we  deal 
are  limited,  forces  us  to  measure  in  order  to  best  conserve 
them.  A  cash  register  is  used  because  money  is  limited 
and  therefore  must  be  measured  in  order  to  conserve  it. 
Nature  has  limited  the  time  allotted  to  man  to  do  things, 
and,  because  of  this  fact,  he  has  built  labor-saving  ma- 
chines with  which  distances  are  annihilated  and  work  is 
done  as  rapidly  as  possible  so  that  time  may  be  saved. 
Because  our  available  supply  of  energy  is  limited,  we  must 
conserve  each  unit  to  be  utilized  in  doing  the  world's 
work. 

One  of  the  fundamental  principles  to  be  kept  in  mind 
in  the  solution  of  educational  problems  is  that  everything 
with  which  the  educator  works — tijne,  energy,  money, 
apparatus,  resources  of  all  kinds — are  so  limited  that  trust- 
ing them  in  the  hands  of  the  ignorant  cannot  he  endorsed, 
and  prodigality  with  them  is  crime. 

Waste  Not  Peculiar  to  American  Schools. — ^Americans 
are  the  most  wasteful  i^eople  in  the  world.  Our  bountiful 
natural  resources,  our  great  expanse  of  land,  our  seem- 
ingly exhaustless  supply  of  those  material  things  that 
supply  human  wants  have  made  us  wasteful.  Suppose,  by 
our  unscientific  methods  in  farming,  we  did  deplete  the 
soil,  was  not  there  plenty  of  new  land  awaiting  us?  What 
did  it  matter  if  millions  of  dollars'  worth  of  natural  gas 
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was  allowed  to  go  to  waste  in  obtaining  oil  so  long  as  the 
oil  fields  yielded  a  profit  ?  Thousands  of  tons  of  fish  have 
been  caught  and  wasted  because  men  have  failed  to  see 
that  the  demand  might  some  day  overtake  the  supply. 
Timber  in  our  great  forests  has  been  ruthlessly  destroyed 
because  a  short-sighted  public  did  not  stop  to  consider  that 
the  end  was  already  in  sight.  Now  that  the  new  lands  are 
about  all  gone,  many  of  the  natural  resources  wasted,  the 
population  increasing,  and  competition  growing  sharper 
every  day,  we  have  at  last  begun  to  realize  that  we  must 
conserve  while  there  is  yet  an  opportunity. 

Human  Energy  Must  Be  Conserved. — Because  human 
energy  is  limited,  it  becomes  necessary  to  economize  and 
distribute  it  in  such  ways  that  it  will  accomplish  the  most. 
If  we  put  forth  more  energy  than  is  necessary  to  do  a 
thing,  there  is  waste.  Likewise  there  is  waste  if  less  energy 
is  put  forth  than  is  necessary  to  accomplish  the  task.  We 
do  our  most  difficult  tasks  vrith  the  least  waste  of  power 
and  energy  when  we  accurately  adjust  our  energies  to  the 
task  to  be  done.  The  ends  to  be  realized  many  times  are 
remote  and  complex,  and  if  we  use  adequate  means,  dis- 
tances in  space,  remoteness  in  time,  units  of  energy  neces- 
sary to  do  the  task,  quantity  of  some  kind  must  be  taken 
into  account,  and  this  means  that  measurements  must  be 
made.  Waste  no  doubt  has  occurred  in  education  more 
because  of  a  lack  of  scientific  methods  of  conservation  than 
because  of  any  indifference  or  negligence  on  the  part  of 
teachers. 

Opportunities  for  Waste  in  Education. — When  fully 
considered,  much  of  the  great  waste  in  education  is  due 
to  our  lack  of  adequate  means  for  placing  reliable  estimates 
on  our  results  and  processes.  We  lack  in  the  matter  of 
definite,  desirable,  attainable  goals  to  be  sought  through 
a  given  topic,  or  process,  or  stage  of  work  in  a  given  sub- 
ject.    We  have  been  forced  to  work  in  a  more  or  less 
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blind,  do-and-trust-to-luck  way.  Wherever  the  application 
of  scientific  measurements  to  the  achievements  of  school 
children  has  been  made,  it  has  shown  that  great  waste  and 
unbusinesslike  methods  are  being  practiced.  A  school  sys- 
tem should  meet  the  same  requirements  that  a  business 
corporation  must  meet.  The  output  must  be  commensurate 
with  the  expenditure. 

Just  as  in  the  business  world  the  greatest  economies  are 
effected  through  small  savings,  so  the  school  must  expect 
to  make  its  greatest  gains  by  checking  up  the  small  leaks 
in  time,  energy,  and  expense.  A  saving  of  thirty-five  min- 
utes a  day,  for  instance,  would  save  a  child  one  school  year 
in  eight,  or  a  saving  of  three  and  one-half  minutes  per 
day  would  mean  a  saving  of  one  school  month  in  the  course. 

The  Business  Man  Eliminates  Waste  through  Accurate 
Knowledge  of  His  Processes  and  Products. — The  keynote 
of  successful  business  to-day  is  accurate  knowledge  of  de- 
tails applied  in  such  a  manner  as  to  eliminate  needless 
waste.  The  margin  between  profit  and  loss  is  determined 
by  the  skill  of  the  manager  in  effecting  small  savings. 
Science  applied  to  the  meat-packing  industry,  for  instance, 
showed  that  a  very  big  profit  could  be  made  by  utilizing 
what  had  formerly  been  considered  useless.  In  the  busi- 
ness world  nothing  is  left  to  chance.  No  important  action 
is  based  upon  vague  opinion  or  untested  theory.  Exact 
knowledge  must  first  be  obtained.  Seven  out  of  ten  busi- 
ness failures  are  due  to  a  lack  of  definite,  obtainable,  busi- 
ness knowledge. 

A  few  months  ago  the  writer  had  an  interview  with  an 
employee  of  one  of  the  life  insurance  companies  doing 
business  in  the  Pacific  Northwest.  He  is  what  is  known 
as  a  general  agent  whose  business  it  is  to  go  about  the 
country  and  obtain  local  agents  to  sell  life  insurance.  By 
a  system  of  tests,  measurements,  and  personal  interviews 
he  is  able  so  to  choose  his  men  that  eight  out  of  every  ten 
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chosen  become  successful  insurance  agents.  He  said  that 
if  he  could  become  90  per  cent  efficient,  that  is,  if  he  were 
able  so  to  choose  his  men  that  nine  out  of  every  ten  chosen 
would  be  successful  agents,  he  would  save  his  company; 
thousands  of  dollars  each  year. 

Nowhere  may  the  struggle  for  efficiency  be  seen  more 
concretely  than  in  industry,  for  no  field  has  a  more  con- 
stant and  compelling  motive-profit.  Here  are  exhibited 
new  processes,  new  labor-saving  devices,  new  methods  of 
planning,  more  detailed  instructions,  more  exacting 
records.  Astonishing  statistics  show  that  the  products  are 
doubled  and  tripled  as  a  result  of  those  methods.  There 
is  a  decrease  in  cost  for  the  producer  and  an  increased 
product  for  the  worker. 

Waste  of  material,  waste  in  effort,  in  energy,  in  lives, 
and  in  property,  is  reduced  to  the  minimum  in  order  that 
the  profit  may  be  large. 

Wastes  in  Education  Many  and  Varied. — Because  of 
the  many  tasks  devolving  upon  the  public  schools,  they 
have  become  the  foremost  instrument  of  social  economy. 
This  added  responsibility  has  increased  the  chances  for 
waste  and  inefficiency. 

The  startling  revelation,  made  a  few  years  ago,  that 
our  system  of  free  education  was  failing  to  give  even  com- 
plete elementary  schooling  to  the  majority  of  children 
evoked  imperious  demands  for  more  real  facts.  When 
thousands  of  children  are  reaching  maturity,  unskilled  and 
unwanted,  and  are  pointing  an  accusing  finger  at  the  school 
they  were  so  glad  to  leave,  the  public  begins  to  question 
the  returns  for  the  great  sums  so  lavishly  expended  on 
the  educational  institutions.  It  begins  a  careful  exami- 
nation of  the  waste  products,  the  juvenile  delinquent,  the 
youthful  criminal,  the  wayward  girl,  and  the  unskilled 
youth,  all  of  whom  are  unwanted  and  ill-adapted  for  em- 
ployment.    The   school  must   measure   quantitatively   its 
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fundamental  problems.  It  must  account  at  every  stage  for 
its  raw  material,  its  waste  products,  and  its  marketable 
commodities. 

The  purpose  of  the  school  is  not  that  the  child  shall 
learn,  for  he  will  learn  without  the  school.  Learning  is 
a  spontaneous  process  which  no  lack  of  schooling  can  stop 
and  no  extent  of  schooling  can  do  more  than  modify.  Its 
purpose  is  to  furnish  conditions  under  which  the  child, 
through  systematic  and  economic  effort,  wiU  accomplish 
more  for  himself,  and  that  which  is  accomplished  will  be 
of  a  better  quality,  and  the  product  will  be  obtained  by 
him  in  shorter  time  and  with  less  expenditure  of  energy 
than  if  he  learned  it  under  other  conditions. 

The  principle  of  economy  when  applied  to  the  indi- 
vidual child  presents  many  problems  of  school  organiza- 
tion, one  of  which  is  that  the  organization  must  be  such  that 
each  child  shall  have  an  opportunity  for  being  at  his  best 
all  the  time  in  all  the  subjects.  The  school  has  but  one 
purpose,  the  education  of  children.  Consequently,  in  a 
strict  sense,  economic  management  in  education  can  be 
defined  only  as  a  system  of  management  directed  toward 
the  elimination  of  waste  in  teaching  so  that  children  attend- 
ing school  may  be  duly  rewarded  for  the  expenditure  of 
their  time  and  effort.  The  point  at  issue  here  involves 
the  discovery  of  processes  that,  other  things  being  equal, 
will  perform  a  given  task  in  the  smallest  amount  of  time 
with  tlie  minimum  expenditure  of  energy. 

Four  Economies  in  Education. — We  seek  economy  in 
education  from  four  different  points  of  view :  ( 1 )  economy 
through  the  quality  of  the  product;  (2)  economy  in  the 
quantity  of  the  product;  (3)  economy  in  the  time;  and 
(4)  economy  in  the  expenditure  of  energy.  Waste  will 
not  be  prevented  and  the  maximum  accomplished  unless 
these  four  economies  move  together,  each  in  harmony  with 
the  other,  and  each,  therefore,  serving  as  a  check  on  the 
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other.  Nothing  may  be  gained  by  '^  robbing  Peter  to  pay 
Paul."  One  fundamental  defect  in  pedagogical  thinking 
has  been  the  overemphasis  given  to  some  one  feature  of 
human  development  at  the  expense  of  other  features  no 
less  important.  All  interests  must  be  carefully  considered 
before  we  can  claim  efficiency.  In  penmanship,  for  in- 
stance, quality  demands  that  the  pupil  be  taught  to  write 
well,  but  quantity,  time,  and  the  expenditure  of  energy 
demand  that  he  do  not  write  too  well.  Quantity  and  the 
writer's  time  demand  that  the  pupil  develop  speed  in 
writing,  but  the  time  and  expenditure  of  energy  of  the 
reader  are  considerations  that  insist  that  he  do  not  write 
too  rapidly. 

In  the  matter  of  style,  quality  demands  that  the  style 
be  most  legible.  This  demand  was  the  cause  a  few  years 
ago  of  our  substituting  the  vertical  system  for  the  Spen- 
cerian.  But  vertical  writing,  though  most  easily  read, 
failed  to  satisfy  the  demands  of  quantity  and  speed.  The 
attempt  to  obtain  the  proper  evaluation  of  these  four 
economies  gives  rise  to  experimental  problems  involving 
measurements.  The  efficiency  of  instruction  will  be  deter- 
mined, not  by  developing  one  phase  of  the  work  at  the 
expense  of  the  othei*s,  but  by  a  development  such  that  the 
sum  of  the  net  gains  in  the  four  economies  shall  be  the 
maximum. 

Education  should  become  efficient  as  industry  is  efficient,  i 
This  means  that  educators  should  know  what  the  finished 
product  is.  They  should  be  able  to  gauge  the  time,  quan- 
tity, and  value  of  the  elements  that  entered  into  it.  They 
should  see  that  they  actually  produce  what  they  say  they 
are  going  to  produce.  They  must  determine  the  proper 
ratio  of  product  and  time;  they  must  define  standards 
and  eliminate  waste;  if  they  are  to  be  efficient  in  con- 
formity to  the  world-wide  ideal  of  efficiency,  they  must 
employ  efficiency  tools,  one  of  the  chief  of  which  is  measure- 
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ment.     Education  cannot  assume  the  efficiency  ideal  with- 
out adopting  its  concomitant — measurement. 

III.  Placing  Education  on  a  Factual  Basis  Through 
Educational  Measurements 

If  schools  are  inefficient  for  any  one  reason  more  than 
another,  they  are  inefficient  because  of  the  ignorance  of 
facts  concerning  their  processes  and  products.  Although 
the  problems  concerning  elementary  education  have  con- 
fronted the  world  for  centuries  and  many  educators  have 
attempted  their  solution,  they  are  still  involved  in  uncer- 
tainty and  indefiniteness.  The  statements  made  on  simple 
practical  questions,  even  among  our  leading  educators,  are 
conflicting  to  the  point  of  absurdity.  Educators  are 
divided  into  creeds.  Those  belonging  to  different  creeds 
are  seldom  in  agreement  even  on  the  simplest  processes 
of  educational  procedure.  Of  course,  there  are  some  phases 
of  education  that  are  matters  of  creed.  "What  constitutes 
good  citizenship  is  a  matter  of  creed,  but  it  should  be  a 
scientific  fact  whether  the  rules  of  spelling  are  helpful, 
or  what  constitutes  reasonable  speed  and  accuracy  in  add- 
ing a  column  of  figures  in  the  fifth  grade,  or  what  con- 
stitutes a  legible  specimen  of  handwriting.  Where  every- 
thing is  "guesswork"  and  there  is  no  proof  to  offer  as  to 
who  is  right  and  who  is  wrong,  it  is  little  wonder  that 
the  ship  of  pedagogy  is  "waterlogged  in  the  sea  of 
opinions."  Hume's  description  of  the  metaphj^sical 
sciences  of  his  day  may  not  inaptly  be  applied  to  the  con- 
dition of  current  education.     He  said:^ 

Even  the  rabble  without  doors  may  judero  from  the  noise  and 
clamour  Avhich  thcv  hear  that  all  2:oes  not  well  within.     There 


'  Robert  R.  Rusk,  7?l^rocZuc^^on  to  Experimental  Education  (Longmans, 
Green  &  Co.),  pp.  1-2. 
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is  nothing  which  is  not  the  subject  of  debate,  and  in  which  men 
of  learning  are  not  of  contrary  opinions.  The  most  trivial  ques- 
tion escapes  not  our  controversy  and  in  the  most  momentous 
we  are  not  able  to  give  any  certain  decisions.  Disputes  are 
multiplied  as  if  everything  were  uncertain;  and  these  disputes 
are  managed  with  the  greatest  warmth,  as  if  everything  were 
certain.  Amidst  all  this  bustle  'tis  not  reason  which  carries 
the  prize,  but  eloquence;  and  no  man  need  ever  despair  of  gain- 
ing proselytes  to  the  most  extravagant  hypotheses,  who  has  art 
enough  to  represent  it  in  any  favorable  colours.  The  victory 
is  not  gained  by  the  men-at-arms,  but  by  the  trumpeters,  drum- 
mers and  musicians  of  the  army. 

There  is  much  speculation.  '*I  think,"  ''I  guess,"  "It 
is  my  opinion"  are  the  characteristic  phrases  in  educa- 
tion. "I  know"  is  a  phrase  that  has  scarcely  been  ad- 
mitted. 

Two  Kinds  of  Opinions  and  Their  Uses. — Generally; 
speaking,  we  may  classify  opinions  under  two  general  liead- 
ings:  (1)  expert  opinion,  and  (2)  the  opinion  of  the  lay- 
man. By  expert  opinion  we  mean  the  opinion  of  one  who, 
by  his  long  experience  and  study,  is  able  to  base  his  judg- 
ment on  data  usually  not  available  to,  or  at  least  not 
possessed  by,  the  average  individual.  For  example,  J.  E. 
Wooters  wanted  to  determine  the  dates  and  events  that 
might  be  memorized  most  profitably  by  students  in  Amer- 
ican history  in  the  seventh  and  eighth  grades.  He  sent 
questionnaires  to  the  members  of  the  American  Historical 
Association  enclosing  a  list  of  52  dates  and  requesting 
that  the  20  most  important  dates  in  this  list  be  arranged 
or  '* ranked"  in  the  order  of  their  importance.  If  other 
dates,  not  given  in  the  list,  were,  in  the  judgment  of 
those  making  the  reply,  more  important  than  those  sub- 
mitted, they  were  to  be  inserted.  When  the  answers  were 
compiled,  the  date  1776  ranked  first  in  importance,  1492, 
second,  and  so  on.  These  men  presumably  were  giving 
expert   opinion.     Being  historians  by  profession,   it  was 
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presumed  that  their  opinions  were  the  conclusions  drawn 
from  the  best  historic  evidence  available.* 

The  opinions  of  both  the  experts  and  the  laymen  may 
be  individual  or  they  may  be  the  combined,  average,  or 
median  opinions  of  a  group.  For  example,  the  judgment 
of  three  or  four  surgeons  called  into  consultation  as  to 
whether  a  certain  operation  should  be  performed  illus- 
trates group  opinion.  Generally  speaking,  group  opinion 
is  more  reliable  than  individual,  whether  it  be  among 
experts  or  laymen.  It  is  less  liable  to  be  affected  by 
personal  bias  and  superficial  characteristics. 

By  lay  opinion  we  mean  the  opinion  or  judgment  of  the 
average  individual  who  expresses  himself  on  the  various 
current  problems  and  topics,  scientific  and  otherwise,  with 
little  or  no  data  upon  which  to  base  his  judgment.  This 
type  of  evidence  has  been  ruled  long  since  out  of  court  but 
7wt  out  of  education. 

Although  the  opinion  of  the  layman  is  of  no  value  for 
scientific  purposes,  it  is,  nevertheless,  a  formidable  factor 
in  education,  simply  because  the  margin  between  technical 
information  and  lay  opinion  is  so  narrow  that  the  educa- 
tional expert  is  not  always  able  to  convince  the  public 
that  his  position  is  right. 

Opinions  Worthless  in  the  Face  of  Facts. — In  the 
absence  of  facts  opinions  reign  supreme.  In  educational 
procedure  mere  opinion  has  had  an  all-too-important  place 
on  the  programme.  Where  facts  are  available  there  is  no 
longer  any  justification  for  mere  opinion.  The  follow- 
ing illustration  will  show  the  fallacy  of  basing  any  pro- 
cedure on  mere  opinion  when  the  facts  are  available.  Sup- 
pose a  group  of  40  individuals  were  asked  their  opinions 
of  the  length  of  a  certain  room.    Let  us  suppose  that  each 

^  W.  C.  Bagley,  "The  Determination  of  Minimum  Essentials  in 
Elementary  Geography  and  History,"  National  Society  for  the  Study 
of  Education,  Fourteenth  Yearbook,  Part  I,  pp.  139-140. 
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of  them  is  an  expert  in  judging  lengths  of  rooms.  Their 
individual  judgments  may  range  all  the  way  from  45  to 
50  feet.  Then  suppose  we  take  the  average  judgment  of 
the  entire  gTOup  and  we  find  it  to  be  48  feet.  In  the 
absence  of  a  measuring  stick  this  may  be  considered  the 
best  measurement  that  it  is  possible  to  get.  Yet  it  is 
quite  evident  that  this  judgment  may  not  correspond  to 
the  facts  at  all.  It  is  possible  that  a  measuring  stick 
might  show  the  length  of  the  room  to  be  45  feet  8  inches. 
It  is  also  quite  possible  that  no  one  of  the  experts  judged 
the  length  to  be  what  the  yard  stick  showed. 

Conditions  analogous  to  this  have  prevailed  in  educa- 
tion from  time  immemorial.  Either  because  we  did  not 
have  the  facts  or  did  not  care  to  go  to  the  trouble  to  get 
them,  false  judgments  have  been  allowed  to  stand.  Educa- 
tion and  theology  are  two  fields  where  opinions  have 
reigned  supreme.  The  layman  has  been  free  to  challenge 
the  judgment  of  the  minister  and  the  teacher  because 
neither  could  prove,  nor  disprove  the  points  at  issue.  The 
facts  upon  which  judgment  should  have  been  made  have 
not  been  available.  Fortunately,  however,  assertions  are 
no  longer  the  style  in  educational  circles.  The  scientific 
spirit  of  the  age  demands  that  assertions  be  backed  up 
with  statistics  based  upon  the  results  of  experiments. 
Ipse  dixit  will  no  longer  suffice. 

Those  working  to  develop  measurements  realize  that  only 
by  getting  the  facts  can  education  become  an  art  and  a 
science,  and  its  practice  changed  from  a  vocation  to  a 
profession.  Just  as  measurements  and  the  possession  of 
facts  developed  astronomy  from  astrology,  chemistry  from 
alchemy,  and  physics  from  mystery,  so  they  will  rescue 
education  from  the  sea  of  opinions  and  place  her  in  the 
family  of  sciences  where  she  rightly  belongs. 

The  reason  why  illusion  and  unverified  hypotheses  have 
so  run  riot  through  educational  discussion  has  been  the 
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absence  of  accepted  standards  of  measurements.  We  have 
' '  guessed ' '  that  we  were  not  making  progress  in  arithmetic, 
so  we  added  50  minutes  a  week  to  make  the  work  more 
efficient.  We  have  "guessed"  that  pupils  were  making 
a  reasonable  progress  from  grade  to  grade,  but  we  never 
knew  actually  what  was  that  progress.  We  have  guessed 
all  along  the  line.  Educators  have  tried  to  solve  the  prob- 
lems by  means  of  hypotheses  based  on  psychology,  whereas 
facts  alone  will  tell  the  tale.  As  a  consequence,  we  have 
a  great  mass  of  philosophical  opinions  of  what  should  be 
done. 

Inductive  Methods  Not  Used  in  Pedagogy. — Pedagogy 
represents  a  remarkably  anomalous  condition,  for,  as  the 
department  that  points  the  way  to  the  development  of 
the  sciences,  it  itself  has  failed  to  adopt  what  it  has  long 
been  recommending  to  other  scientific  pursuits,  namely, 
the  inductive  method  of  study.  Its  work  has  consisted 
of  opinions  based  on  opinions  and,  therefore,  of  a  mass  of 
contradictory  material.  No  really  sustained  forward  move- 
ment can  be  expected  until  conflicting  \-iews  are  subjected 
to  analysis  in  the  light  of  clear  and  unmistakable  facts. 
In  the  proportion  that  we  are  able  to  retain  the  genuine 
and  reject  the  spurious,  education  will  move  forward 
among  the  sciences.  Unless  the  validity  of  educational 
opinion  is  established  by  verifiable  data,  which  any  tech- 
nically informed  person  may  apply,  educational  procedure 
will  never  become  scientific.  We  must  have  technical  in- 
formation, the  validity  of  which  is  indisputable.  When 
this  is  done,  we  shall  displace  imagination,  guesswork,  and 
oratory  as  criteria  for  reshaping  the  educational  policies. 

Lines  of  Demarcation  between  Knowable  Facts  and 
Philosophical  Opinions  Must  Be  Sharply  Drawn. — It  must 
not  be  inferred  that  all  problems  in  education  are  amen- 
able to  measurement.  As  was  indicated  above,  elementary 
education  presents  two  distinct  types  of  problems:   one 


EFFICIENCY  THROUGH  MEASUREMENTS       25 

is  involved  in  subtleties  and  belongs  to  the  depart- 
ment of  philosophy;  the  other  is  more  superficial  and  is 
in  a  large  part  a  question  of  science.  The  first  includes 
those  factors  which  relate  to  the  development  of  character 
and  the  cultural  phases  of  one's  life.  Culture  emphasizes 
the  things  of  the  mind  and  the  higher  life.  It  seeks  to 
beget  the  ability  to  enjoy  the  beautiful  and  the  good  wher- 
ever these  may  be  found.  It  seeks  the  ability  to  think 
the  best  thoughts  of  the  best  men  that  we  find  enshrined 
in  literature.  If  one  would  make  life  worth  living,  he 
must  partake  very  largely  of  cultural  things.  These  things 
are  not  directly  amenable  to  measurements,  at  least  not 
at  the  present  time.  On  the  other  hand,  the  mechanical 
skills,  positive  knowledge,  most  of  the  things  taught  in 
the  formal  subjects,  such  as  arithmetic,  grammar,  read- 
ing, spelling,  etc.,  readily  yield  to  quantitative  treatment. 
It  lies  within  our  reach  to  ascertain  the  time  consumed  by 
different  teachers  in  obtaining  certain  positive  results  and 
to  discover  what  processes  have  proved  most  economical. 

Data  on  Simplest  School  Room  Problems  Lacking, — 
Anyone  accustomed  to  the  problems  that  come  up  for 
solution  in  the  course  of  a  day's  teaching  can  think  of 
dozens  of  questions  for  which  he  would  like  a  definite 
answer,  as,  for  instance:  What  words  should  be  taught 
in  spelling?  How  many?  What  methods  should  be  used? 
How  rapidly  should  a  boy  in  the  fourth  grade  read,  that  is, 
how  many  words  per  minute?  What  is  meant  by  legible 
handwriting?  What  dates  should  be  taught  in  history? 
Are  rules  beneficial  in  the  teaching  of  spelling?  Stand- 
ard tests  and  scales  offer  the  same  effective  instruments  of 
research  in  dealing  with  these  problems  as  the  meter  and 
the  gram  offer  to  the  student  of  physics.  It  is  difficult 
to  see  how  anything  can  be  done  for  the  science  of  educa- 
tion without  them.  At  present  we  are  absolutely  unable 
to  form  an  intelligent  judgment  of  how  much  time  should 


26  EDUCATIONAL  MEASUREMENT 

be  required  to  teach  any  of  the  simple,  formal  things  in 
education.  Educators  are  not  yet  agreed  what  words 
should  be  taught,  when  to  teach  them,  and  how  long  it 
should  take  to  do  it.  It  is  in  these  fields  that  education  is 
inefficient. 

Cultural  Development  Supplemented  by  Mechanical 
Phases  of  Education. — While  everyone  engaged  in 
measuring  recognizes  that  the  highest  product  of  the  school 
training  is  the  development  of  ideals  and  character  in 
children,  some  are  becoming  convinced  that  successful  work 
in  character-building  is  absolutely  dependent  upon  the  suc- 
cessful work  of  developing  the  fundamental  skills  and  the 
formal  phases  of  education.  The  studies  that  have  been 
made  show  that  culture  and  inspirational  work  are  condi- 
tioned by  the  proper  equipment  of  the  child  with  the  me- 
chanical tools  by  which  all  mental  work  is  done.  It  is  a  mis- 
take, therefore,  to  think  that  the  time  devoted  to  the 
mechanical  skills  is  out  of  harmony  with,  and  antagonistic 
to,  cultural  education.  On  the  other  hand,  it  is  equipping 
the  child  with  the  tools  for  appreciating  the  beautiful  and 
the  good. 

No  Excuse  for  Mere  Opinion  Where  Facts  Are  Avail- 
able.— The  time  has  come  in  education  when  mere 
opinion  will  satisfy  neither  the  modern  school  adminis- 
trator, nor  the  public.  Things  in  the  business  world  are 
measured  and  standardized  with  a  precision  not  yet 
reached  in  education.  Even  those  things  which  a  few  years 
ago  were  considered  incapable  of  measurement  are  now 
measured  and  standardized  so  that  the  public  knows  ex- 
actly what  is  meant  when  a  thing  is  spoken  of  as  belong- 
ing to  a  certain  class  or  grade.  For  instance,  oranges  and 
apples  are  standardized  according  to  the  variety  and  size. 
When  a  grocer  orders  grapefruit  and  says  to  the  salesman, 
"Send  me  a  box  of  'sixty-fours,'  "  he  knows  just  what  to 
expect  and  knows  whether  tlie  order  has  been  filled  prop- 
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erly  when  it  arrives.  In  the  case  of  shoes  the  average  indi- 
vidual may  not  be  able  always  to  tell  the  difference  between 
"firsts"  and  "seconds"  but  the  expert  can,  and  it  isn't 
mere  guesswork  on  his  part.  He  has  perfectly  definite 
standards  by  which  to  judge. 

If  we  ask  a  carpenter  to  build  a  room  we  give  him  a 
set  of  specifications  of  the  length,  width,  and  shape  of  the 
room  that  we  desire  to  have  built.  When  he  has  finished 
the  job,  we  can  take  the  specifications  and  check  the  finished 
product  to  see  if  it  is  done  according  to  them. 

Any  business  man  who  is  managing  a  successful  busi- 
ness in  which  he  expects  to  make  profits  will  have  stand- 
ards from  which  he  will  work.  He  will  have  measuring 
sticks  to  measure  the  efficiency  of  his  output ;  he  will  have 
units  of  accomplishment;  he  will  have  cost-accounting 
systems;  he  will  have  standards  of  many  kinds.  He  also 
will  have  a  continual  system  of  testing  and  working-over 
what  he  is  doing  to  find  out  whether  what  he  is  doing 
is  the  best  that  can  be  done  with  the  money  he  has  at 
hand,  and  whether  the  output  is  as  large  as  he  could  well 
expect  with  the  energy  and  money  he  has  invested  in  it. 
Why  should  we  not  run  the  schools  on  the  same  scientific 
basis  ? 

Answers  to  such  questions  as  what  a  12-year-old  boy 
in  the  seventh  grade  should  do  with  a  ten-figure-column 
addition  problem  have  been,  until  recently,  a  matter  of 
opinion,  and  usually  of  conflicting  opinion  at  that.  All 
would  agree  that  a  boy  of  12  should  do  better  than  a  boy 
of  10,  but  we  have  had  no  adequate  conception  until 
recently  as  to  how  much  better  he  should  do.  We  have  had 
no  objective  standards  of  what  a  child  should  do  at  any 
specified  age.  With  the  advent  of  educational  measure- 
ments we  now  are  able  to  show  what  the  best  child  will 
do,  what  the  poorest  child  will  do,  and  what  per  cent  of 
children  will  be  able  to  make  a  particular  score. 
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Biology,  physics,  chemistry,  and  psychology  have  grovm 
by  methods  of  analysis  and  measurements.  Education 
must  imitate  this  prudence  if  she  would  command  the 
respect  of  scientific  men.  To  make  education  effective  and 
efficient  we  must  have  an  eye  for  causes  of  inefficiencies. 
We  must  discover  the  causes  of  success  in  some  cases  and 
failure  in  others.  Educational  measurements  offer  impar- 
tial and  impersonal  evidence  v/hich  cannot  be  refuted. 

Relation  of  Time  Consumed  to  the  Finished  Product. 
■ — In  any  well-regulated  factory  or  business  enterprise 
of  any  kind  there  is  a  definite  relation  between  the  time 
[consumed  and  the  amount  of  work  done.  The  time  it  takes 
to  make  the  various  parts  of  an  automobile,  or  a  wheel- 
barrow, or  a  steam  shovel,  is  known  within  quite  definite 
limits,  and  an  employee  is  quickly  brought  to  account  when 
this  relation  is  seriously  disturbed.  How  different  it  is  in 
the  school  business.  The  researches  made  in  education 
25  years  ago  by  Dr.  J.  M.  Rice,  shocked  the  educational 
world  by  revealing  the  terrible  state  of  ignorance  among 
educators  of  some  of  the  simplest  things  in  their  profes- 
sion. They  tried  to  laugh  him  out  of  court ;  to  treat  him 
as  an  eccentric  person  trying  to  perform  the  impossible 
by  measuring  things  in  human  nature.  Dr.  Rice's  own 
account  of  the  reception  given  him  at  the  national  meeting 
of  superintendents  at  Indianapolis,  in  February,  1897,  is 
both  interesting  and  illuminating,  and  when  contrasted 
with  our  present  attitude  toward  him,  gives  us  much 
encouragement  because  of  the  progress  made.  Dr.  Rice 
had  published  in  the  Forum  of  December,  1896,  an  article 
entitled,  "Obstacles  to  Rational  Educational  Reforms."  In 
reference  to  this  article  he  says :° 

In  a  way  that  I  had  not  anticipated  I  brought  it  directly  to 
the  notice  of  the  Department  of  Superintendence  at  its  annual 

*  Scientific  Management  in  Education,  pp.  17-18. 
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meeting  in  Indianapolis,  in  February,  1897.  I  had  been  invited 
to  conduct  a  round-table  discussion  on  the  three  R's,  and  had 
expected  a  handful  of  people  to  talk  the  matter  over  quietly  and 
leisurely.  But  it  so  happened  that  the  round-table  turned  out 
to  be  a  mass  meeting,  including  the  picked  educational  people 
of  the  country.  After  a  few  opening  remarks  I  endeavored 
to  arouse  discussion  on  the  question  which  I  stated  somewhat 
as  follows:  In  some  cities  ten  minutes  a  day  are  devoted  to 
spelling  for  eight  years;  in  others,  forty.  Now  how  can  we 
tell  at  the  end  of  eight  years  whether  the  children  who  have 
had  forty  minutes  are  better  spellers  than  those  who  have  had 
only  ten? 

I  had  expected  in  this  way  to  draw  out  the  ideas  of  those 
who  believed  in  much  teaching  of  spelling  and  those  who  be- 
lieved in  little  of  it,  and  thus  to  labor  for  a  compromise;  but 
to  my  great  surpiise,  the  question  tlu-ew  consternation  into  the 
camp.  The  first  speaker  to  respond  was  a  very  popular  pro- 
fessor of  psychology  engaged  in  training  teachers  in  the  West. 
He  said,  in  eifect,  that  the  question  was  one  that  could  never 
be  answered;  and  gave  me  a  severe  drubbing  for  taking  up  the 
time  of  such  an  important  body  of  educators  in  asking  them 
silly  questions. 

The  next  speaker  was  a  prominent  superintendent,  who  did 
not  like  the  way  I  had  been  treated  and  tried  to  come  to  my 
rescue.  After  this  quite  a  number  took  the  platform  in  response 
to  calls  from  the  audience  and  spoke  on  the  spelling  in  a  gen- 
eral way;  but  no  one  attempted  to  answer  the  question. 

It  is  interesting  to  note  that  when  the  same  association 
of  superintendents  met  fifteen  years  later  in  St.  Louis 
they  devoted  forty-eight  addresses  to  the  subject  of  edu- 
cational measurements. 

We  naturally  should  expect  that  the  ordinary  principles 
of  arithmetic  would  operate  in  finding  the  relation  between 
the  time  spent  and  the  results  obtained.  We  would  expect 
that  if  twice  as  much  time  is  spent  on  a  subject  in  one 
school  as  in  another,  the  ratio  of  the  finished  products 
would  bear  some  such  relation  as  two  to  one.  Such,  how- 
ever, was  not  the  finding  of  Dr.  Rice.     He  found  classes 
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to  whom  spelling  was  taught  incidentally  just  as  efficient 
as  those  who  had  forty  minutes  of  daily  drill. 

The  first  forward  step  in  a  problem  of  this  kind  is  to 
get  the  facts  on  what  is  actually  being  done.  Then  by  a 
comparison  of  the  things  accomplished  in  the  various 
schools,  tentative  standards  may  be  set  up.  The  goals 
reached  by  the  best  schools  may  be  taken  as  the  standard 
of  efficiency  toward  which  all  schools  should  work.  When 
standards  are  set  and  facts  are  gathered  on  the  standing 
of  any  school,  it  becomes  a  comparatively  easy  matter  to 
measure  progress. 

In  the  absence  of  facts  on  what  ought  to  be  accomplished 
and  what  is  actually  being  accomplished  people  have  had 
to  guess  as  to  the  efficiency  of  the  school  systems.  If,  in 
the  opinion  of  school  officials  or  the  public,  the  schools 
are  deficient  in  some  particular  phase,  the  usual  procedure 
is  to  give  more  time  to  that  part  of  the  work,  even  though 
the  supposed  weakness  may  actually  be  the  strongest  link 
in  the  chain  of  processes.  A  superintendent  thus  con- 
fronted must  do  something  to  meet  the  onslaught  of  criti- 
cism directed  against  his  school.  Not  knowing  a  better 
thing  to  do,  he  adds  a  little  more  time  to  that  phase  of 
the  work  that  has  been  pronounced  inefficient,  and  trusts 
to  luck  that  what  he  has  done  will  satisfy  the  public  and 
do  no  harm  to  the  school. 

The  following  taken  from  the  preface  of  the  Salt  Lake 
City  Survey®  illustrates  the  point  in  question: 

During  two  or  three  years  preceding  1915  a  certain  amount 
of  general  criticism  developed  in  Salt  Lake  City  with  reference 
to  the  work  of  the  schools  and  the  efficiency  of  the  instruction 
and  supervision.  The  harmonious  cooperation  which  had  pre- 
viously existed  between  the  Board  of  Education  and  the  Super- 
intendent of  Instruction  came  to  be  somewhat  impaired  and 
the  confidence  of  the  citizens  in  their  schools  somewhat  shaken. 

6  Published  by  World  Book  Co.,  Yonkers-on-the-Hudson,  N.  Y.  Used 
by  permission  of  the  publishers. 
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In  particular,  the  rather  eommon  complaint  was  raised  that  the 
administration  of  the  schools  was  not  efficient  and  that  the 
instruction  in  the  fundamental  school  subjects  was  not  produc- 
ing the  best  results.  The  superintending  authorities  did  what 
they  could  to  meet  such  criticism  by  increasing  time  allowances 
and  similar  measures  but  without  appreciable  results.  Finally 
the  superintendent  of  the  schools  for  the  city  recommended  to 
the  Board  of  Education  that  a  survey  be  made. 

The  survey  showed  that  the  Salt  Lake  City  schools  were 
strongest  in  the  very  phases  they  had  been  considered  weak- 
est by  the  public. 

An  opinion  circulated  in  the  educational  field  soon  be- 
comes the  equivalent  of  a  law.  For  instance,  the  idea 
was  circulated  that  spelling  was  a  matter  of  heredity  and 
the  heirs  to  this  wonderful  faculty  were  very  largely  con- 
fined to  the  upper  classes,  yet  Dr.  Rice  found  in  giving 
his  spelling  tests  that  the  highest  grades  made  by  any 
children  tested  were  made  by  children  whose  parents  were 
Bohemian  cigarmakers. 

In  arithmetic  he  found  the  children  in  the  slums  of 
some  cities  doing  a  great  deal  better  than  those  from  the 
best  districts  in  others.'' 

The  Public  Is  Asking-  for  a  Ledger  Aeeount  of  Our 

Business. — When  the  attitude  of  the  public  is  impartially 

considered,  one  must  come  to  the  conclusion  that  it  has 

been  both  tolerant  and  liberal.     The   debit  side   of  the 

ledger  shows  the  time  and  money  expended,  the  number 

of  teachers  hired,  the  books  and  apparatus  provided,  the 

buildings  built,  and  the  courses  of  study  made.     On  the 

credit  side  is  recorded  in  vague,  meaningless  adjectives, 

and  numbers  which  follow  no  known  rules  of  mathematics, 

the  standards  reached  and  the  progress  made.    The  public, 

and   even   the    teachers    themselves,    do   not   know    what 

"excellent"  means  in  history,  what  "A"  means  in  algebra, 

or  what  "95  per  cent"  means  in  Latin.  In  the  case  of 
__ 
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Latin,  for  instance,  a  father  would  have  a  pretty  hard  time 
telling  the  relative  amount  of  Latin  his  two  sons  knew 
if  one  were  given  a  grade  of  90  per  cent  and  the  other  a 
grade  of  45  per  cent.  Applying  the  ordinary  rules  of 
arithmetic  he  might  expect  one  to  know  twice  as  much 
Latin  as  the  other,  but  such  probably  is  not  the  case.  A 
boy  may  take  home  a  grade  of  90  per  cent  in  arithmetic 
or  a  grade  of  70  per  cent  in  penmanship  year  after  year, 
and  when  his  parents  sign  his  monthly  report  card  they 
do  not  feel  very  much  of  a  thrill  over  the  knowledge  they 
have  gained.  With  practically  no  standards  by  which  to 
work  an  accountant  would  have  a  pretty  hard  time  audit- 
ing the  books  of  a  system  of  public  schools  and  enlighten- 
ing the  public  as  to  their  efficiency. 

In  the  school  business  we  have  a  state  monopoly  in 
which  there  is  almost  no  accounting.  We  have  compelled 
the  children  to  come  to  school.  We  have  made  courses  of 
study  which  we  felt  were  good  but  which  were  very  often 
based  on  conditions  past,  rather  than  on  present  and  future 
needs.  After  the  instruction  has  been  given  to  the  chil- 
dren we  have  been  content  to  trust  to  the  growth  proc- 
esses to  bring  out  the  kind  of  development  hoped  for. 
Professor  Cubberley  likens  the  methods  employed  in  school 
to  the  old-fashioned  luck  farming,  where  the  farmer  looked 
at  the  moon,  guessed  at  the  weather,  put  in  his  crop,  and 
prayed  to  the  Lord  to  pull  him  through  another  season.^ 

Inability  to  Show  Facts  Has  Worked  a  Hardship  on  the 
Teaching  Profession. — No  one  knows  the  number  of  good 
teachers  who  have  lost  their  positions  because  they  did 
not  have  facts — hard,  cold,  tangible  facts — to  prove  to  the 
public  and  the  school  officials  that  their  work  was  meri- 
torious.    They  have  been  dismissed  many  times  because 

'Ellwood  P.  Cubberley,  "The  Significance  of  Educational  Measure- 
ments," Third  Annual  Conference  on  Educational  Measurements, 
Bulletin  No.  6  (Indiana  University,  1917),  p.  7. 
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the  demands  were  unjust.  Unless  a  teacher  has  a  correct 
idea  as  to  where  her  pupils  are,  educationally,  when  they 
come  to  her,  and  what  their  native  capacities  are,  how 
can  she  or  the  public  or  the  school  officials  know  what 
progress  they  have  made?  A  supervisor  may  go  into  a 
grade  room  and  say  to  the  teacher,  "You  are  doing  poor 
work  in  reading."  The  teacher  may  respond  by  saying, 
"No,  I  am  not.  I  am.  doing  good  work  in  the  subject  of 
reading. ' '  In  the  absence  of  a  measuring  stick  or  a  stand- 
ard of  good  reading  how  is  one  to  tell  who  is  right?  It 
is  simply  the  supervisor's  opinion  against  that  of  the 
teacher.  If,  however,  some  standard  has  been  set  such 
as  that  made  by  Professor  Courtis,  the  teacher  may  say, 
"Let  us  measure  the  pupils  and  see."  If  they  can  read 
with  the  speed  and  comprehension  set  as  the  standard,  the 
supervisor  will  have  to  withdraw  his  statement  and  the 
teacher  is  sustained.  On  the  other  hand,  if  the  class  is 
not  up  to  the  standard,  the  test  will  quickly  show  it.  The 
test  being  impartial  and  unbiased,  the  teacher  can  have  no 
complaint  to  make  against  the  supervisor  for  unjust 
demands. 

As  soon  as  school  officials  recognize  the  fact  that  measure- 
ments define  for  them  just  how  much  may  reasonably  be 
demanded,  they  will  be  unafraid  of  measurements.  They 
will  learn  the  administrative  lesson  that  it  is  better  to  know 
for  purposes  of  ordinary  routine  what''  ought  to  be  de- 
manded, than  merely  to  guess  at  conditions. 

The  business  man  used  to  be  offended  if  anyone  criticized 
his  methods  or  commented  on  his  results.  Now  he  knows 
that  his  best  friend  is  the  man  who  comes  and  tells  him 
exactly  where  he  stands.  The  one  thing  a  successful  busi- 
ness man  cannot  approve  of  to-day  is  ignorance  about  the 
results  of  his  business.  He  does  not  fool  himself  any  more, 
come  what  will  of  the  revelation. 

The  school  principal  who  knows  in  advance  where  his 
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school  is  weak  and  where  it  is  strong  is  armed  against 
criticism.  More  than  that,  he  is  guided  in  his  future  efforts. 
Some  superintendents  have  feared  the  facts  in  reference 
to  their  schools  because  they  knew  that  the  number  of  im- 
perfections revealed  in  the  survey  would  be  appalling.  One 
has  considerable  hesitancy  in  going  to  the  dentist  know- 
ing that  his  teeth  are  sure  to  ''go  to  pieces"  under  the 
keen  scrutiny  of  the  expert.  Yet  his  better  judgment  tells 
him  to  go.  In  the  presence  of  a  popular  demand  for  the 
revelation  of  the  imperfections  and  the  absolute  certainty 
that  imperfections  exist,  it  is  not  difficult  to  understand 
why  there  is  a  tendency  on  the  part  of  many  school  offi- 
cials to  combat  the  movement  towards  widespread  measure- 
ments. 

The  demand  for  measurements  is  likely  to  be  especially 
keen  if  there  is  some  parent  in  the  community  who  does 
not  like  the  superintendent  or  the  principal.  Such  a 
parent  may  desire  to  "show  up"  the  inefficiency  of  these 
officials.  He  never  believes  for  one  moment  that  the 
responsibility  for  unsatisfactory  school  work  may  be  traced 
to  the  native  limitations  of  his  child  or  to  the  home  atmos- 
phere in  v/hich  he  grows  up.  Such  a  parent  is  sure  that 
measurements  will  detect  at  some  point  a  lack  of  perfec- 
tion that  will  give  his  dislike  for  the  school  officer  the 
sanction  of  science. 

In  spite  of  the  imperfections,  the  American  people  are 
willing  to  pay  and  pay  well  for  any  reasonable  project  in 
education.  Their  liberality  cannot  be  questioned  even 
under  present  conditions.  They  may  be  made  to  go  to 
almost  any  limit  if  given  the  facts. 

IV.  Establishment  of  Definite  Standards 

It  is  obvious  that  one  must  have  some  kind  of  a  stand- 
ard before  anything  may  be  judged.  If  one  says  he  has 
a  good  suit  of  clothes,  he  is  basing  his  judgment  on  some 
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standard  or  standards.  It  may  be  good  because  it  will 
wear  well,  or  because  it  fits  perfectly,  or  because  it  holds 
its  shape  well  when  pressed.  The  merchant  in  selling  the 
suit  may  have  called  attention  to  the  color;  that  it  will 
not  fade ;  that  it  is  all  wool ;  that  the  weave  is  of  the  latest 
design,  etc.  The  purchaser  takes  the  data  submitted  by 
the  merchant  and  measures  the  facts  by  his  standard  of  a 
good  suit.  He  knows  that  a  good  suit  will  wear  well ;  that 
it  must  be  all  wool;  hold  its  shape  well;  fit  perfectly,  and 
be  pleasing  as  to  colors.  There  will  always  be  differences 
of  opinion  as  to  which  of  these  qualities  should  have  the 
most  weight.  For  instance,  two  suits  may  be  of  the  same 
price,  but  one  is  made  of  better  material  than  the  other, 
while  the  second  may  be  more  pleasing  as  to  color.  One 
customer  may  prefer  to  take  the  poorer  quality  of  goods 
in  order  to  get  the  proper  colors  or  weave,  while  another 
may  place  practically  all  the  emphasis  on  the  wearing 
qualities  of  the  garment.  Experience  has  taught  the 
purchaser  what  his  standards  should  be. 

When  we  make  a  similar  comparison  to  a  school  system, 
we  find  conditions  decidedly  changed.  In  the  first  place, 
the  school  official  cannot  furnish  the  individual  about  to 
judge  a  school  system  with  the  facts  as  the  merchant  could 
the  purchaser  of  the  suit  of  clothes.  In  the  second  place, 
he  has  not  had,  until  recently,  any  way  of  knowing  what 
a  school  system,  or  a  particular  grade  in  a  system,  ought 
to  do.  Suppose  he  were  told  the  sixth  grade  could  read 
a  certain  selection  at  the  rate  of  150  words  per  minute 
and  could  reproduce  80  per  cent  of  the  ideas  found  in  it, 
"Would  he  consider  this  a  model  class  as  to  reading,  a 
mediocre  class,  or  a  poor  one?  Or  suppose  he  were  told 
that  in  a  certain  school  system  the  multiplication  tables 
were  taught  to  a  beginning  fourth-grade  class  in  twenty 
lessons  so  that  95  per  cent  of  all  the  children  in  that  grade 
knew  them  perfectly.     Would  this   be   considered  good. 
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mediocre,  or  poor  as  far  as  time  is  concerned?  In  other 
words  what  is  the  standard  as  to  time  in  teaching  the 
multiplication  table  1 

A  bricklayer  has  a  standard  and  knows  how  long  it 
ought  to  take  to  lay  1,000  bricks.  A  man  shingling  or  lath- 
ing a  house  knows  how  long  it  ought  to  take  to  cover  100 
square  feet  of  surface,  or  how  long  it  takes  to  nail  on  1,000 
shingles  or  laths.  All  such  work  is  broken  up  into  definite 
units  of  accomplishment,  and  standards  are  set  for  each 
unit. 

There  are  hundreds  of  things  in  education  that  might 
be  broken  up  into  definite  units  of  accomplishment,  and 
standards  of  achievement  set  for  each  unit.  In  our  country, 
where  elementary  education  is  characterized  by  the  absence 
of  system,  it  is  not  unusual  for  individuals,  whether  edu- 
cators or  laymen,  to  examine  a  class  with  a  set  of  questions 
selected  in  an  arbitrary  way  and  judge  by  the  results 
whether  or  not  the  teacher  has  done  satisfactory  work.  So 
long  as  we  have  no  definite  standards,  judgments  based  on 
the  results  of  an  examination  may  continue  to  do  a  gross 
injustice  in  estimating  both  the  qualifications  of  the 
teachers  and  the  value  of  the  methods  employed  by  them. 

Three  Kinds  of  Standards  Needed. — At  least  three 
kinds  of  standards  are  needed  in  education:  (1)  standards 
as  to  quantity;  (2)  standards  as  to  time;  (3)  standards 
as  to  quality.^ 

1.  Standards  as  to  Quantity. — The  first  step  toward  plac- 
ing elementary  education  on  a  scientific  basis  must  neces- 
sarily lie  in  determining  what  results  reasonably  might  be 
expected  at  the  end  of  a  given  period  of  instruction.  Of 
course,  the  first  thing  that  determines  the  standard  must 
be  the  degree  of  capacity  that  the  average  child  has  for 

'W.  W.  Black,  "The  Movement  for  Greater  Economy  in  Educa- 
tion," Second  Annual  Conference  on  Educational  Measurements, 
Bulletin  No.  11  (Indiana  University,  1915),  pp.  7-12. 
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the  type  of  training  that  we  want  to  give  him.  We  cannot 
fix  the  standards  irrespective  of  the  child.  The  theoretical 
ideal  of  perfection  must  be  overthrown  and  rational  de- 
mands made,  not  on  what  an  ideal  child  should  do,  but 
on  what  the  average  child,  as  he  comes  to  the  teacher,  is 
expected  to  do.  Then  we  can  venture  to  tell  the  parents 
with  assurance  that  their  children  in  the  fifth  grade,  for 
instance,  are  as  good  as  the  average  even  if  they  misspell 
50  per  cent  of  the  words  in  a  certain  spelling  test. 

The  standard  set  will  be  subject  to  many  conditions  and 
will  have  to  be  justified  from  many  points  of  view.  It 
will  be  open  to  question  from  the  point  of  view  of  economy, 
organization,  and  method.  Because  of  the  interrelation  of 
these  different  factors  in  determining  efficiency,  such  ques- 
tions as  the  following  arise :  Does  the  standard  demand  too 
much  or  too  little  in  time  and  energy?  Could  the  in- 
dividual, under  a  different  organization  and  method,  meet 
the  requirements  of  the  standard  more  economically? 
Could  he,  under  a  different  method,  raise  the  standard  with 
the  same  expenditure  of  time  and  energy?  Is  it  desirable 
to  raise  the  standard?  Is  the  method  such  that  the  pupil 
has  developed  a  sufficient  degree  of  ability  in  applying 
the  knowledge  indicated  in  the  standard? 

Definite  standards  should  be  required  in  determining 
and  checking  the  applications  of  the  principles  of  method. 
When  our  standards  as  to  quantity  are  determined,  the 
teacher  can  tell  exactly  when  the  child's  mechanical  work 
is  completed.  This  would  enable  him  to  determine  the 
amount  and  character  of  drill  work  to  be  done. 

It  is  not  what  the  teacher  does  that  counts.  It  is  what 
the  child  does  and  thinks,  and,  until  our  work  is  organized 
to  take  advantage  of  these  factors,  we  cannot  hope  to 
improve  the  efficiency  very  much.  We  must  know  exactly 
what  is  meant  by  "satisfactory  results,"  then  no  time 
would  be  wasted  when  the  goal  is  reached.    Many  pupils 
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have  been  kept  at  writing  and  other  mechanical  drills 
long  after  satisfactory  results  have  been  reached  simply 
because  the  teacher  did  not  know  when  the  pupil  arrived. 
There  is  a  great  deal,  too,  to  be  said  for  definite  standards 
for  the  psychological  effect  they  have  on  both  the  pupil 
and  the  teacher.  Each  likes  to  know  how  well  he  is  doing 
and  how  near  he  is  to  the  goal. 

It  may  be  argued  that  it  would  be  impossible  to  secure 
a  definite  standard  for  measuring  results  generally 
applicable  in  our  country  on  the  ground  that  the  needs 
of  our  people  vary  in  different  localities.  While  this  senti- 
ment deserves  recognition,  it  will  become  apparent,  during 
the  course  of  this  chapter,  that  proper  attention  to  local 
conditions  in  the  conduct  of  our  elementary  schools  would 
not  tend  in  the  least  to  alter  the  measurement  plan  as  a 
whole. 

2.  Standards  as  to  Time. — AVlien  our  standards  as  to 
quantity  have  been  determined,  our  attention  may  then 
be  directed  toward  the  discovery  of  short-cuts  in  educa- 
tional processes  that  will  save  the  child's  time.  All  would 
agree  that  we  should  devote  a  reasonable  amount  of  time  to 
get  a  reasonable  result.  But  what  constitutes  a  **  reason- 
able time"  is  the  unsolved  problem.  To  arrive  at  a  con- 
clusion in  this  matter  we  must  find  how  much  time  has 
been  given  to  a  subject  in  the  best  schools  where  reason- 
able results  have  been  obtained  and  make  our  calculations 
accordingly. 

Because  of  a  lack  of  standards,  we  no  doubt  have  ex- 
pected far  too  much  of  pupils  on  some  occasions,  and  on 
other  occasions  they  could  have  done  twice  as  much  very 
easily.  William  James  has  pointed  out  that  the  average 
man  lives  far  within  his  limits  and  possesses  powers  of 
various  sorts  which  he  habitually  fails  to  use.  There  is 
little  doubt  that  the  requirements  of  school  children  are 
far  within  their  limits  in  many  instances,  and  that  they 
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could  easily  do  two  or  three  times  the  work  they  now  do 
in  the  same  time. 

3.  Standards  as  to  Quality. — Much  already  has  been  said 
about  quality;  so  a  short  discussion  will  suffice.  When  a 
child  can  write  a  specimen  of  handwriting  equivalent  to 
Quality  13  or  14  on  the  Thorndike  Scale,  for  instance,  he 
is  said  by  competent  judges  to  write  well  enough  for  all 
practical  purposes.  Any  further  time  spent  in  improving 
quality  above  this  point  is  a  questionable  procedure  unless 
it  is  definitely  known  that  the  student  will  be  called  upon 
to  do  clerical  work  that  may  require  a  higher  standard. 
The  point  is,  that  standards  give  focus  and  direction  to 
the  work  and  fix  a  point  above  which  drill  may  not  be 
profitable.  Many  boj^s  and  girls  in  school  whose  hand- 
writing is  equivalent  to  Qualities  14  to  17  on  the  Thorn- 
dike  Scale  are  required  to  practice  daily  when  their  arith- 
metic and  grammar  need  their  time  much  more. 

Standards  Changing. — Some  object  to  standards  on  the 
ground  that  they  are  constantly  changing  and  for  that 
reason  are  really  not  standards  at  all.  The  argument  is 
poor,  however,  because  that  is  exactly  what  we  should 
expect  if  progress  is  to  be  made.  That  is  what  happens 
in  business  life  everywhere.  While  standards  do  change, 
yet  they  are  stable  enough  to  justify  their  determination 
and  are  absolutely  essential  to  the  best  school  work.  As 
a  matter  of  fact,  all  teachers  have  them  but  they  are  sub- 
jective and  for  that  reason  are  not  as  valuable  as  they 
ought  to  be. 

V.  Profiting  by  the  Experience  of  Those  in  Other 
Occupations  and  Professions 

In  the  manufacturing  world  inventions  and  improve- 
ments are  usually  made  in  one  of  two  ways:  A  manu- 
facturing firm  may  "go  to  school  to  its  competitor"  and 
learn  new  ways  of  doing  things,  or  it  may  offer  a  reward 
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to  its  employees  who  can  devise  new  and  better  ways  of 
doing  things.  Any  patent  which  is  a  labor-saving  device 
is  quickly  appropriated  by  men  in  all  lines  of  business. 
There  is  no  hesitancy  in  appropriating  new  ideas  irrespec- 
tive of  their  source,  the  only  question  being,  Will  tliey 
work?  In  the  recognized  field  of  science,  such  as  physics, 
chemistry,  medicine,  etc.,  the  members  of  the  profession 
are  not  only  wiUing  to  learn  from  each  other  but  they  are 
compelled  to  do  so  under  penalty  of  law.  Those  who  fail 
in  practice  to  give  due  recognition  to  important  discov- 
eries are  held  responsible  for  the  consequences. 

If  education  is  to  become  a  science  it  must  utilize  scien- 
tific methods  just  as  other  businesses  and  professions  have 
done.  We  must  discover  some  truths  in  regard  to  educa- 
tional processes  which,  if  ignored  by  the  teacher,  will  make 
him  liable  to  prosecution  for  malpractice  just  as  the 
physician  who  has  bungled  the  setting  of  a  bone. 

Teachers  make  mistakes  year  after  year  simply  because 
no  record  is  made  of  their  procedure  and  because  there 
are  no  signposts  to  guide  the  new  ones  away  from  the  pit- 
falls. Successful  businesses  and  professions  are  not 
operated  in  such  a  haphazard  manner. 

Our  government  maintains  a  Bureau  of  Standards  at 
Washington  with  exact  units  for  all  measures  of  mass, 
length,  and  time,  with  a  large  number  of  derived  units. 
For  all  current  work  in  physical  science  these  standard 
units  are  essential  and  without  them  modern  physics  and 
chemistry  could  not  exist.  Every  new  discovery  or  in- 
vention in  science  must  be  stated  in  terms  of  them,  and 
every  physical  and  commercial  enterprise  in  modern  civil- 
ization is  absolutely  dependent  on  them.  It  is  not  surpris- 
ing that  a  physicist  with  highly  refined  instruments  for 
physical  measurements  should  look  doubtfully  upon  the 
yet  immature  efforts  of  measuring  mental  qualities  and 
mental  achievements.     Would  it  be  possible  in  education 
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to  get  the  same  kind  of  measurements  we  have  in  physical 
science,  as  measurements  in  length  and  weight?  We  can 
tell  a  boy's  stature  in  a  ijerfectly  definite  way  so  that 
everybody  in  the  world  will  laiow  what  we  mean.  We 
can  tell  the  weight  of  a  person  or  the  number  of  pounds 
he  can  lift,  and  the  world  accepts  our  measures  under- 
standingly  and  without  question  because  the  units  have 
been  standardized. 

The  following  examples  from  the  business  world  indicate 
the  great  painstaking  care  that  successful  business  takes 
to  see  that  efficiency  is  maintained  throughout  its  system, 
and  how  responsibility  is  placed  on  each  one  in  the  system 
to  carry  out  his  particular  part  of  the  programme. 

In  the  Middle  West  a  great  corporation  manufactures 
stoves  in  large  quantities.  The  shop  history  of  each  stove 
is  kept  in  detail  from  raw  material  to  the  shipment  of 
the  finished  product.  The  name  of  each  workman 
responsibly  concerned  in  the  inspection  of  the  raw  material, 
in  the  m.aking  of  any  part,  in  the  examination  and  ship- 
ment of  each  part,  is  recorded.  A  complaint  from  the 
purchaser  immediately  locates  the  responsibility  upon  the 
workman  at  fault.  The  article  upon  which  all  this  pains- 
taking care  is  lavished  sells  from  fifteen  dollars  up.  The 
method  of  successful  business  makes  it  possible  to  place 
responsibility  where  it  belongs.  The  public  schools  might 
well  imitate  this  prudence. 

The  Manufacturer  Has  Specifications. — The  manufac- 
turer knows  exactly  what  he  is  trying  to  do.  He  draws 
his  specifications  according  to  the  needs  of  the  market 
he  is  trying  to  supply.  He  knows  how  much  of  his  out- 
put is  efficiently  manufactured  and  how  much  raw  material 
goes  to  waste  in  the  dump  pile.  His  object  is  to  obtain 
a  more  satisfactory  output  so  there  will  be  a  larger  profit 
for  the  business. 

The  ideals  and  processes  of  scientific  method  in  educa- 
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tion  are  in  salient  respects  similar  to  those  that  are  re- 
shaping the  processes  of  industry.  In  education  as  in 
industry  the  scientific  idea  is,  at  base,  analj^tic  scrutiny, 
exact  measuring,  careful  recording,  and  judgment  on  the 
basis  of  observed  facts.^° 

The  school  principal  who  tests  his  school  is  guided  in 
his  future  efforts.  The  fact  that  adverse  criticism  causes 
no  shock  is  of  some  importance,  and  the  positive  result 
is  that  the  school  is  stimulated  to  improve  itself.  We  have 
awakened  to  a  startled  realization  that,  in  education  as 
in  other  forms  of  organized  activity,  applied  science  will 
do  the  work  much  better  than  the  old  trial  and  error 
methods.  Even  those  processes  that  have  rested  secure 
in  the  sanction  of  generations  may  be  improved  by  the 
application  of  science. 

In  dealing  with  the  application  of  scientific  methods 
to  education  and  to  industry  we  must  ever  bear  in  mind 
two  fundamental  distinctions.  These  relate  to  the  use  of 
time  and  the  types  of  product  in  these  two  kinds  of  activi- 
ties. In  industry  the  finished  product  conforms  to  a  definite 
pattern.  It  is  a  constant.  The  variable  is  the  time.  If 
the  finished  product  is  to  be  a  wheelbarrow  of  a  certain 
design,  the  problem  is.  How  long  will  it  take  to  turn  out 
a  wheelbarrow  according  to  the  specifications  laid  down 
in  the  pattern?  The  task  to  be  done  is  always  definite. 
In  education  it  is  different.  The  time  is  a  constant,  eight 
years  in  the  grades,  for  instance.  There  is  no  definite 
pattern  according  to  which  we  try  to  mould  the  lives  of 
the  boys  and  girls.  Their  native  capacities,  aptitudes,  and 
proclivities  are  different.  We  expect  training  to  develop 
more  differences  than  fewer.  These  fundamental  differ- 
ences must  be  kept  in  mind  when  business  methods  and 
education  are  compared.^^ 

1" Leonard  P.  Ayres,  "Making  Education  Definite,"  ibid.,  p.  87. 
"  Ibid.,  pp.  87-88. 
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VI.  Meeting    Some   Objections   to    Educational    Tests 
AND  Measurements 

In  launching  any  great  campaign  for  reform  in  method 
and  ways  of  doing  things,  one  must  expect  considerable 
opposition  from  not  only  those  engaged  in  the  business 
but  from  outsiders  as  well.  Meeting  the  arguments  against 
such  movements  becomes,  therefore,  as  vital  a  factor  in 
getting  the  new  methods  in  operation  as  some  of  those 
which  are  very  much  closer  to  the  real  problems  to  be 
solved.  In  this  part  of  the  discussion  we  shall  note  some 
of  the  objections  which  have  been  raised  in  reference  to 
tests  and  measurements  in  education. 

Educational  measurement,  like  all  new  movements,  has 
its  scoffers  and  its  zealots.  It  is,  of  course,  easy  to  point 
out  imminent  dangers  in  measurements.  Like  all  new 
movements  it  will  have  to  run  the  gauntlet  of  criticism. 
Measurement  in  education  presupposes  commensurable 
quantities,  yet  it  is  known  that  many  educational  products 
are  incommensurable.  There  are  spiritual  and  material 
products  in  education.  Over  the  former  there  will  ever 
be  the  veil  of  mystery  and  doubt.  Man  spiritually  has 
always  been  the  enigma  of  existence  and  will  continue  to 
be  so.  We  can  never  plot  the  curve  of  genius  nor  measure 
the  unit  of  inspiration.  But  there  are  a  vast  number  of 
educational  products  that  are  measurable.  The  day  of 
the  educational  engineer  is  at  hand.  The  day  of  educa- 
tional opinion  will  go,  and  the  swaj'  of  dominant  person- 
alities in  education  will  be  limited  by  facts. 

When  a  new  movement  is  launched,  the  best  thing  to 
do  is  not  to  endorse  it,  or  condemn  it,  but  to  study  and 
understand  it.  This  is  not  what  always  happened  in  edu- 
cation, however.  But  the  opposition  to  change  is  not 
peculiar  to  education.  It  makes  its  appearance  in  other 
professions  and  industries.     The  important  thing  is  to  be 
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able  to  meet  the  objectors  with  hard,  cold,  irrefutable 
facts  that  will  ijrove  their  contentions  to  be  wrong.  The 
objections  to  tests  are  many  and  varied.  We  shall  note 
a  few  of  the  more  common  ones. 

1.  Tests  will  not  endure. — Some  opposing  the  testing 
movement  saj^  that  tests  will  not  endure  the  ravages  of 
time  and  change;  that  they  are  of  temporary  value,  and 
those  we  are  using  to-day  will  be  ''scrapped"  for  some- 
thing else  to-morrow.  In  answer  to  this  objection  all  we 
can  say  is  that  v/e  do  not  expect  them  to  endure  in  their 
present  imperfect  form.  Progress  is  made  by  casting  aside 
the  outgrown  and  imperfect  tools  for  the  more  modern 
instruments.  Other  sciences  have  progressed  in  this  way. 
The  pseudo-sciences  of  ancient  times  supplied  the  founda- 
tions for  the  later  scientific  achievements.  In  any  advanc- 
ing society,  old  knowledge,  old  philosophies,  and  old  cul- 
ture must  constantly  be  in  a  state  of  reconstruction  to 
keep  pace  with  the  race's  progress. 

2.  The  child  mind  is  too  complicated  to  measure. — There 
are  those  who  say  that  the  mind  is  complicated  by  so  many 
elements  that  enter  into  its  development  that  no  definite 
conclusions  can  be  drawn.  They  are  supported  in  this 
view  by  the  fact  that  even  broad-minded  teachers  of  wide 
experience  differ  on  the  most  elementary  points  coming 
under  their  daily  observation.  And  this  further  item  may 
be  mentioned  in  their  favor,  that  even  the  same  teachers 
are  constantly  changing  their  views;  they  no  longer  be- 
lieve in  one  year  what  they  firmly  believed  the  year  before ; 
and  a  year  later  they  will  begin  to  feel  that  their  second 
theory  was  wrong  and  their  first  was  right,  and  so  on 
indefinitely.  This  state  of  affairs  comes  about  because 
judgments  have  not  been  based  on  facts  but  on  mere 
opinions.  "Whatever  the  reasons  may  be  for  changing  their 
opinions,  educational  measurements  tend  to  stabilize  them 
because  they  are  based  on  facts. 
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Some  teachers,  for  instance,  will  tell  us  we  cannot 
measure  thought  progression  in  English  composition,  never- 
theless those  same  teachers  mark  one  composition  75  per 
cent,  another,  77  per  cent,  and  a  third,  74  per  cent.  When 
they  have  had  75  per  cent  as  a  passing  mark  they  have 
continually  refused  to  promote  those  who  were  given  a 
grade  of  74. 

3.  The  judgment  of  a  competent  man  is  tetter  tJiari 
scales. — A  strong  objection  is  made  to  scales  on  the  ground 
that  the  common-sense  judgment  of  a  first-rate  man  is 
better  without  these  units  and  scales  than  the  action  of 
a  stupid  or  incompetent  man  with  them.  The  important 
thing  is  not  whether  a  first-rate  man  can  do  better  with- 
out such  scales  than  an  incompetent  one  with  them,  but 
rather  that  the  efficiency  of  each  is  increased.  On  the 
other  hand,  it  is  precisely  the  work  of  science  to  get  good 
work  done  by  those  who  are  rather  mediocre.  Thanks  to 
the  progress  of  science,  we  can  now  solve  problems  that 
Aristotle  could  not.  We  would  all  prefer  to  have  a  stupid 
doctor  of  to-day  who,  nevertheless,  understood  the  use  of 
antiseptics  and  antitoxins  than  Galen  or  Hippocrates, 
though  in  respect  to  common  sense  there  would  be  no 
choice. 

The  dangers  of  measurements  are  as  real  and  imminent 
as  the  advantages  are  self-evident.  Dangers  will  arise  from 
the  mass  of  superficial  and  erroneous  results  that  will 
certainly  be  presented  to  the  educational  world  in  the  guise 
of  scientific  contributions  applied  to  pedagogy.  But  we 
must  welcome  all  these  contributions,  challenge  them,  and 
attempt  to  verify  them.  The  sciences  cannot  escape  in- 
jury if  their  results  are  forced  into  the  rush  of  the  day 
before  the  fundamental  ideas  have  been  cleared  up  and 
an  ample  supply  of  facts  collected. 

4.  Tests  tend  to  reduce  all  educational  worh  to  a  dead 
level  with  no  allowance  for  individual  differences. — As  to 
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the  tendency  to  obtain  uniform  results,  the  effect  is  meant 
to  be  just  the  opposite  to  uniform.  Our  surveys  are  giv- 
ing us  a  knowledge  of  conditions,  both  in  school  and  out 
of  school,  under  v^hich  the  individual  child  lives.  Physical 
tests  will  tell  us  more  accurately  than  we  have  been  able 
to  know  heretofore  the  ability  of  each  individual.  With 
this  accurate  knowledge  of  the  individual  child,  there  is 
no  way  open  but  to  make  his  self-furnished  standards  the 
guides  in  his  case.  This  charge  of  reducing  all  individuals 
to  a  dead  level  is  born  of  a  complete  misunderstanding  of 
the  aims  and  processes  of  the  new  method.  Its  aim  is  not 
uniformity  but  individual  development.  People  fear  lest 
educational  experiments  will  make  children  lose  their  spon- 
taneity, or  spirituality,  or  something  of  the  sort.  They 
feel  a  certain  skepticism  about  tests  and  scales.  We  should 
not  care  much  for  the  teacher  who  did  not  have  a  great 
interest  in  the  finer,  subtler  qualities  of  character  and 
tastes.  At  the  same  time  an  artist  can  preserve  all  his 
interest  in  the  finer,  subtler  qualities  of  taste  and  still 
use  the  compass  points  to  measure  his  distances  as  he 
draws.  * '  Certainly,  that  old  rebuke  made  in  the  first  days 
of  child-study  movement  that  if  we  were  at  all  scientific 
with  our  children  we  would  come  to  love  them  less  is  out 
of  place.  It  is  the  mothers  who  love  their  babies  most  who 
weigh  them  oftencst. ' '  ^-  The  mother  who  weighs  her  baby 
every  month,  and  to  an  ounce,  is  in  these  days  precisely 
the  mother  who  loves  her  baby  on  the  average  as  well,  if 
not  better,  than  others. 

Teachers  must  not  expect  tests  to  do  everything  for 
their  schools.  The  rather  mediocre  intellect  of  youth  which 
repels  information  will  not  blossom  out  overnight  by  some 


"Edward  L.  Thorndike,  "Units  and  Scales  for  Measuring  Educa- 
tional Products,"  Proceedings  of  a  Conference  on  Educational  Measure- 
ments, Bulletin  No.  10  (Indiana  University),  pp.  128-141. 
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sort  of  hocus-pocus  of  measurements  into  a  marvelous 
genius  who  will  learn  anything  set  before  him.  A  part 
of  the  trouble  has  been  the  fact  that  our  specifications 
have  not  always  been  dictated  by  the  needs  of  the  future. 
Too  often  they  have  been  framed  by  those  who  are  think- 
ing in  terms  of  the  past.  We  wish  to  train  the  children 
in  our  schools  that  they  may  earn  eificiently  and  com- 
fortably their  daily  bread,  but  most  of  us  want  them  to 
become  more  than  skilled  bricklayers,  carpenters,  iron- 
workers, stenographers,  telegraphers,  more  than  capable 
physicians,  lawyers,  and  merchants.  If  practical  efficiency 
is  all  the  American  schools  can  do  for  a  child,  they  are 
certainly  far  from  efficient. 

Education  will,  of  course,  always  need  its  poets,  its 
artists,  and  craftsmen,  as  well  as  its  managers  and  men 
of  science,  but  it  needs  these  also.  There  is  no  reason 
why  the  artistic  life  should  be  impeded  by  measurements. 
Of  course,  we  cannot  subject  all  human  life  to  mathe- 
matically accurate  tests,  certainly  not  in  the  near  future. 
Human  life  is  deeper  than  all  our  mathematics. 

This  agitation  for  better-measured  products  is  not  con- 
fined to  education.  Commerce,  industry,  politics,  philan- 
thropy, and  other  great  fields  of  endeavor,  are  similarly 
agitated.  Before  the  impact  of  oncoming  generations,  long- 
worshiped  and  long-cherished  idols  are  falling.  The  old 
order  yields  slowly.  It  rejects  the  new  as  propaganda. 
Thus  the  tide  of  conflict  moves  back  and  forth  in  the  great 
fields  of  human  endeavor,  disclosing  in  the  lull  of  struggle 
the  inevitable  process  of  change. 

5.  Tests  measure  so  small  a  part  of  intellectual  life  that 
they  are  not  indicative  of  general  ability. — An  educational 
product  is  usually  complex,  and  its  measurement  is  as 
difficult  as  measuring  an  elephant.  There  are  many  char- 
acteristics to  be  taken  into  consideration.  We  do  not  make 
a  complete  measurement  of  the  total  fact,  but  we  measure 
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the  amount  of  some  feature.  Every  measurement  rep- 
resents a  highly  abstract  and  partial  treatment  of  the 
product.  Many  critics  who  object  to  tests  and  scales  do 
not  understand  this.  An  educational  product  invites  hun- 
dreds of  measurements.  In  making  an  automobile  each 
part  is  measured  in  its  own  peculiar  way.  Linear  measure 
is  used  for  length,  cubic  measure,  for  volume,  and  the 
strength  of  the  steel  is  tested  by  another  unit.  In  exactly 
the  same  way  the  traits,  characteristics,  mannerisms, 
powers,  and  skills  of  an  individual  would  have  to  be  sub- 
jected to  many  kinds  of  measurements  before  their  strength 
and  quantity  could  be  determined.  If  a  teacher  is  a  good 
one,  she  is  good  because  of  the  presence  or  absence  of  cer- 
tain powers,  traits,  skills,  and  mannerisms,  or  she  is  good 
because  she  has  these  characteristics  in  a  certain  propor- 
tion. Dr.  Thorndike  has  pointed  out  the  fact  that  any- 
thing that  is,  exists  in  some  quantity.  If  it  exists  in  some 
quantity,  it  can  be  measured.  Perhaps  we  cannot  measure 
it  to-day  but  we  may  be  able  to  measure  it  some  day. 

People  have  said,  and  rightly,  that  we  can  never  deter- 
mine mathematically  the  degree  to  which  a  strong  man 
and  a  noble  woman  influence  for  good  the  character  of 
the  pupils.  But  they  overlook  the  fundamental  truth  that 
in  education,  as  in  other  pursuits  of  life,  character  and 
efficiency  go  hand  in  hand.  As  school  executives  make 
practical  application  of  the  newer  scientific  tests,  no  fact 
stands  out  with  more  impressive  distinction  than  that  the 
teachers  whose  classes  make  the  best  record  are  the  teachers 
who  are  the  most  truly  successful  in  the  shaping  of  char- 
acter. 

yil.  Keeping  an  Accurate  Record  of  All  Methods  Tried 
AND  Progress  Made 

Measuring  progress  implies  a  starting  point  and  a  goal. 
The  significance  of  this  fact  should  not  be  overlooked  in 
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education.  Attempting  to  measure  progress  in  school  work, 
when  either  the  starting  point  or  the  goal  is  unknown, 
would  be  like  attempting  to  measure  the  progress  of  one 
"joyriding"  about  a  city  with  no  definite  goal  in  mind. 
All  that  could  be  said  is  that  he  had  been  out  one  hour, 
two  hours,  or  some  other  length  of  time.  If,  however, 
one  were  driving  an  automobile  from  Portland  to  Seattle, 
it  becomes  an  easy  matter  to  measure  progress,  because 
there  is  a  definite  starting  point  and  goal.  A  definite 
point  of  departure  and  a  definite  goal  is  also  needed  in 
education  if  progress  is  to  be  measured. 

VIII.  The  Cultivation  of  the  Confidence  and  the  Utili- 
zation OF  THE  Support  of  the  Public 

A  new  kind  of  confidence  must  be  created  on  the  part 
of  the  public  in  our  schools.  This  confidence  constitutes 
the  capital  with  which  the  efficient  school  system  must 
develop  its  dividends  and  activities.  In  devising  educa- 
tional procedure,  therefore,  we  must  constantly  have  in 
mind  the  intelligent  and  public-minded  portion  of  our 
citizenship.  As  education  grows  scientific  it  tends  to  be- 
come less  intelligible  to  the  public.  With  a  growing  tech- 
nical terminology  the  educational  thinker  tends  to  speak 
a  dialect  difficult  for  the  ordinary  person  to  comprehend. 
The  result  is  that  as  the  education  has  become  more  scien- 
tific it  has  tended  to  isolate  itself  from  the  understanding 
of  the  people. 

Teachers  Must  Know  Why  Sweeping-  Changes  Are 
Made. — It  has  been  said  that  there  is  a  growing  tendency 
for  some  city  superintendents  in  large  and  somewhat  cen- 
tralized systems  of  public  schools  to  make  sweeping  changes 
in  schoolroom  procedure  without  consulting,  and  what  is 
more  serious,  without  attempting  to  convince  their  teachers 
of  the  necessity  for  such  changes.  There  can  be  no  true 
profession  of  teaching  where  most  of  the  members  are 


50  EDUCATIONAL  MEASUREMENT 

required  to  carry  out  official  orders  mechanically  and 
blindly.  A  clear  understanding  of  underlying  principles 
is  essential  to  good  teaching.  In  the  art  of  administering 
to  the  intellectual  and  moral  awakening  of  childhood,  the 
spiritual  worker  should  fully  comprehend  the  meaning  of 
the  plan  and  should  have  a  familiarity  with  the  tools  with 
which  he  works.  Successful  educational  leaders  are  realiz- 
ing more  and  more  the  necessity  of  carrying  their  teaching 
staff  with  them  in  all  progressive  reforms,  not  simply  as 
a  matter  of  respect  for  the  teachers,  but  as  a  matter  of 
necessity  in  getting  the  essential  work  of  the  school  done. 
It  is  necessary  that  the  thought  of  the  leaders  be  trans- 
mitted to  the  ranl^  and  file,  to  all  trainers  of  youth,  to 
parents  as  well  as  teachers.  Our  educational  institutions 
are  not  made  by  imperial  edicts  and  bureaucratic  decrees 
handed  down  from  educational  experts.  They  must  come 
from  the  people.  It  is  not  sufficient  that  educational 
leaders  alone  should  know  the  significance  of  a  given  re- 
form or  movement.  The  public  must  understand  and 
accept  the  proposed  policy.  The  intellectual  channels  be- 
tween leaders  and  followers,  profession  and  populace,  must 
be  kept  open.  Both  teachers  and  public  must  know  the 
limitations  of  the  school.  The  leaders  can  advance  no 
farther  with  an  educational  movement  than  the  public  will 
endorse.  Tests  and  measurements,  however  technical  in 
character,  must  not  have  their  significance  clouded.  The 
public  wants  to  understand  its  educational  system  and 
what  it  is  supposed  to  do.  The  ultimate  worth  of  the 
principles  and  devices  must  be  measured  by  the  extent 
to  which  they  are  within  the  comprehension  of  the  typical 
layman.  Our  aim  must  be  the  bulwarking  of  the  cause  of 
public  education  by  a  common  confidence  in  the  schools. 
This  new  confidence  cannot  be  sustained  on  vague  theories 
nor  upon  standards  that  are  established  in  defiance  to 
either  scientific  procedure  or  common  sense. 
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The  Public  Is  Interested  in  Education. — If  our  schools 
are  to  have  credit  for  their  work,  this  new  confidence  must 
obtain.  It  is  true  that  people  as  a  whole  are  not  inter- 
ested in  pedagogy  because  they  do  not  understand  it,  and 
they  are  not  in  sympathy  with  the  pedagogues  because 
they  do  not  understand  their  subtle  minds.  But  they  are 
intensely  interested  in  education.  It  is  a  mistake  to  think 
that  the  lack  of  educational  progress  may  be  attributed 
to  public  indifference  and  its  consequences.  The  people 
are  willing  to  dip  down  into  their  pockets  to  almost  any 
depth  with  reverence  and,  as  a  rule,  without  the  slightest 
murmur. 

The  teaching  profession  has  done  but  little  to  demon- 
strate to  the  public  that  what  it  was  doing  was  worth  while. 
Results  have  always  been  recorded  in  vague,  intangible 
adjectives  such  that  the  public  could  not  understand.  That 
the  public  has  never  taken  an  intelligent  interest  in  its 
schools  is  not  its  fault,  but  that  of  the  educators  them- 
selves. For  how  can  the  public  be  expected  to  distinguish 
the  true  from  the  false  when  the  leaders  in  the  profession 
do  not  agree  on  the  simplest  fundamentals? 

The  ultimate  cause  of  the  lamentably  slow  progress 
toward  the  introduction  of  educational  reforms  may  be 
traced,  therefore,  beyond  the  province  of  the  general  pub- 
lic into  the  professional  circle  itself  to  an  inner  strife  and 
turmoil  consequent  upon  the  uncertainties  in  which  the 
entire  problem  of  elementary  education  is  involved. 

The  so-called  practical  man  is  especially  insistent  in  his 
demands  for  tangible  results. 

With  our  methods  of  reporting  results  it  is  really  won- 
derful how  liberally  the  public  has  contributed  to  the 
support  of  the  public  schools  and  how  little  the  adverse 
criticism  has  been.  So  liberal  has  the  public  been  that 
to-day  we  are  spending  more  on  education  than  all  the 
rest  of  the  world.     In  criticism  of  our  work  the  business 
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man  does  not  say  that  the  child  does  not  know  anything 
at  all  about  adding,  for  instance,  but  he  says  he  does  not 
add  with  the  proper  speed  and  accuracy.  He  does  not 
say  the  child  cannot  understand  English  at  all,  he  says 
he  is  not  intelligent  and  cannot  follow  instructions.  He 
does  not  say  the  child  cannot  read,  he  says  he  reads  too 
slowly  and  misunderstands  what  he  reads.  Why  should 
we  not,  when  we  are  dealing  with  purely  mechanical 
skills  like  the  above,  give  absolutely  definite  specifica- 
tions of  what  we  expect  and  see  to  it  that  no  child  is 
marked  ''passed"  until  these  definite  specifications  are 
reached  ? 

The  American  people  has  had  an  abiding  faith  that 
the  great  public  schools  would  somehow  eventually  push 
everyone  to  the  place  for  which  his  disposition,  talents, 
and  psychophysical  gifts,  prepare  him;  that  they  would 
discover  the  strong  and  weak  points  of  the  children  who 
will  some  day  occupy  the  places  of  responsibility.  With 
that  thought  in  mind,  and  without  demanding  a  strict 
accounting  for  the  vast  sums  so  lavishly  spent,  they  con- 
tinue to  have  faith  in  the  schools  the  accounts  of  which 
they  are  unable  to  audit  because  only  the  debit  side  of 
the  ledger  is  kept.  Science  has  changed  other  professions 
and  businesses  so  that  the  public  is  becoming  accustomed 
to  look  for  tangible  results.  Just  as  in  the  factory  when 
a  piece  of  raw  material  is  put  through  a  certain  process, 
we  look  for  tangible,  measurable  results,  so  in  the  scliool 
system,  when  we  put  a  child  through  an  educational  proc- 
ess, we  look  immediately  for  measurable  results.  If  the 
desired  result  is  still  lacking,  then  the  process  may  be 
repeated  or  a  new  process  or  method  tried.  This  is 
undoubtedly  the  only  sane  way  to  teach  school.  The  old 
method  was  to  put  the  child  through  the  process  and,  with- 
out measuring  the  result,  trust  to  luck  that  the  desired 
goal  was  reached.    If  the  boy  became  a  man  of  prominence, 
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the  school-teacher  was  liable  to  take  considerable  credit 
to  herself  for  teaching  him  correctly. 

In  this  modern  intensive  life  we  cannot  run  the  risk 
in  waiting  until  adult  life  is  reached  to  find  out  whether 
our  educational  processes  are  bringing  the  proper  results. 
It  is  then  too  late.  Society  demands  that  we  find  out 
immediately  what  changes  are  wrought  by  our  educational 
processes.  This  information  may  be  gained  only  by  hav- 
ing definite  units  of  accomplishment  and  scales  and  units 
of  measurement.  The  public  demands  that  the  results  of 
training  be  recorded  in  sensible  scientific  units  and  that 
they  be  given  immediately  after  the  child  has  been  put 
through  the  educational  process  so  that  deficiencies  may 
be  provided  for  before  the  child  has  passed  beyond  the 
influence  of  the  schools. 

Fields  of  Educational  Tests  and  Measurements. — There 
are  many  fields  into  which  the  subject  of  educational  tests 
and  measurements  may  be  divided.  Seven  are  perhaps 
definite  enough  to  deserve  special  mention.  These  are: 
(1)  the  measurements  of  intelligence;  (2)  the  measurements 
of  school  achievement;  (3)  the  measurements  of  the 
materials  of  instruction;  (4)  the  measurements  of  the 
physical  growth  of  school  children;  (5)  the  measurements 
of  the  money  cost  of  education;  (6)  the  measurements  of 
school  buildings;  and  (7)  the  measurements  of  retardation, 
acceleration,  and  elimination. 

Each  of  these  fields  is  subdivided  into  narrower  ones, 
and  some  are  re-combined  to  produce  other  fields  as,  for 
instance,  when  measures  of  intelligence  and  school  achieve- 
ments are  pressed  into  service  to  determine,  in  part,  the 
efficiency  of  a  teacher.  Space  will  not  permit  an  exhaustive 
treatment  of  these  fields  in  an  elementary  work  of  this 
kind,  any  one  of  which  furnishes  sufficient  subject  matter 
for  lengthy  analyses  and  discussions.  Nevertheless,  a  some- 
what lengthy  discussion  of  the  first  two  fields  mentioned 
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above  will  be  given  because  it  is  in  these  fields  particularly 
that  the  regular  classroom  teacher  should  acquaint  her- 
self with  the  tools  for  measuring.  It  is  in  these  fields 
that  measurements  come  in  direct  contact  with  the  learn- 
ing process  in  the  regular  classroom  work.  The  other  fields, 
especially  the  last  three,  perhaps  pertain  more  to  the 
administrative  phases  of  education,  and,  while  they  affect 
the  regular  room-teacher,  they  are  not  quite  so  intimately 
connected  with  the  regular  recitation  work.  Five  chap- 
ters will  be  devoted  to  a  discussion  of  the  first  two  fields, 
while  the  other  five  fields  will  be  discussed  in  a  single 
chapter.  The  dominant  idea  in  this  single  chapter  will 
be  to  map  out  and  orientate  the  various  divisions  rather 
than  to  treat  them  exhaustively.  In  fact,  the  discussion 
will  be  limited  to  a  few  of  the  most  characteristic  phases 
of  each  of  these  five  fields. 
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CHAPTER  III 

THE  MEASUREMENT  OF   INTELLIGENCE 

General  Statement  of  the  Problem. — It  is  a  platitude 
to  say  that  individuals  differ  from  one  another  in  intellec- 
tual achievements.  We  have  known  this  from  time 
immemorial.  The  reasons  why  they  differ  from  one  an- 
other are  known  only  in  part.  It  is  known  that  one's 
intellectual  ability  depends  upon  two  factors:  (1)  native 
capacity,  which  is  a  physical  inheritance  and  therefore 
beyond  the  control  of  the  school;  and  (2)  environmental 
conditions,  a  part  of  which  may  be  directly  controlled  by 
the  school.  It  is  the  purpose  of  this  chapter  to  discuss 
some  of  the  problems  and  technique  that  have  to  do  with 
measurements  of  the  first  factor.  The  particular  phase 
of  this  factor  that  we  desire  to  measure  is  that  known  as 
"general  intelligence,"  The  subject  is  naturally  divided 
into  a  number  of  major  problems  each  of  which  is  sub- 
divided into  a  number  of  lesser  ones.  Some  of  the  major 
problems  are:  (1)  What  is  meant  by  general  intelligence? 
(2)  What  shall  be  the  nature  of  the  tests  to  measure 
general  intelligence?  (3)  What  shall  be  the  units  of 
measurement?  (4)  What  shall  be  the  methods  of  scoring? 
Many  lesser  problems  will  arise  as  the  discussion  proceeds. 

Before  discussing  these  problems,  a  brief  statement  as 
to  the  work  which  was  being  done  in  experimental 
psychology  and  psychiatry  (that  branch  of  neurology 
which  treats  of  mental  disorders  and  of  the  organic  changes 
associated  with  them)    that   led  up   to   the   attempts  to 
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measure  intelligence  will  help  to  clarify  the  situation  and 
throw  some  light  on  the  questions  as  to  why  this  move- 
ment came  into  existence  and  why  certain  tests  were  used 
in  the  early  stages  of  the  work.  Space  will  not  permit 
more  than  the  very  briefest  statements  as  to  general  status 
of  psychology  prior  to  the  attempts  to  measure  intelligence. 

Until  very  recent  times  it  was  generally  conceded  that 
psychical  phenomena  could  not  be  subjected  to  quantita- 
tive measurements.  Real  experimental  psychology  came 
into  being  within  the  memory  of  men  now  living. 

The  honor  of  founding  quantitative  psychology  belongs 
to  Gustav  Theodor  Fechner  (1801-1887).  It  was  as  late 
as  1860  that  he  published  his  Elemente  der  Psychophysic 
which  brought  together  scattered  observations  from 
astronomy,  physics,  and  biology,  which,  together  with  his 
own  elaborate  observations  in  physics,  mathematics,  and 
physiology,  were  placed  at  the  service  of  mental  measure- 
ments. A  student  who  would  understand  the  principles 
and  general  methods  of  mental  measurements  must  stiU 
go  to  school  with  Fechner. 

Wilhelm  Wundt  (1832-1920)  did  more  to  encourage  and 
inspire  students  to  work  in  this  field  than  any  other  modern 
scientist.  He  established  the  first  psychological  laboratory 
at  Leipsic  in  1878.  Wlien  he  began  his  experimental  work 
in  psychology,  there  was  little  save  the  psychophysical 
laws  announced  by  Fechner,  reaction  times,  the  experi- 
mental physiology  of  the  senses,  and  the  early  studies  in 
brain  localization. 

At  the  time  Wundt  established  his  psychological 
laboratory,  the  general  field  of  psychology  was  beginning 
to  be  divided  into  various  fields  each  with  its  own  methods 
of  investigation.  There  were  subjective  psychology  which 
relied  wholly  on  inner  perception,  and  objective  psychol- 
ogy  which  attempted  to  perfect  and  to  supplement  inner 
perception  by  objective  means.     Objective  psychology  was 
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again  divided  into:^  (a)  experimental  or  physiological 
psychology,  which  brought  inner  perception  under  the  con- 
trol of  experimental  appliances;  and  (&)  social  psychology, 
which  sought  to  derive  general  laws  of  psychological  de- 
velopment from  the  objective  products  of  the  collective 
mind. 

"Wundt  realized  that  if  psychology  was  to  advance  it 
must  follow  the  inductive  path.  Two  inductive  methods 
were  available:  (1)  The  method  of  statistics,  which  is 
indirect  and  bears  primarily  on  the  practical  with  little 
emphasis  on  theoretical  psychology.  This  is  the  method 
that  furnishes  psychology  with  its  facts.  (2)  The  method 
of  experiment,  which,  as  Wundt  says,  is  a  principle 
applicable  over  the  whole  range  of  psychology.  It  was 
the  latter  method  in  which  Wundt  was  primarily  inter- 
ested and  the  one  that  chiefly  concerns  us  here. 

The  physiologist  E.  H.  Weber,  desiring  to  determine 
the  power  of  the  skin  and  of  the  muscle-sense  to  dis- 
criminate between  weights,  began  as  early  as  1830  to  make 
experiments  in  the  field  of  sense  perceptions.  Weber  de- 
vised schemes  for  measuring  the  relation  between  the 
intensity  of  stimuli  and  their  corresponding  sensations, 
which  relations  were  designated  by  Fechner  as  Weher's 
laivs. 

When  certain  psychological  tests  were  devised  to  in- 
vestigate special  mental  functions,  it  was  found  that  the 
results  obtained  from  these  tests  varied  with  the  degree 
of  intelligence  of  the  subjects.  When  this  discovery  was 
made,  these  tests  were  diverted  from  their  original  pur- 
pose and  were  employed  in  diagnosing  endowment  or  gen- 
eral intelligence.  For  instance,  on  investigation  it  was 
found  that  in  sensory  discrimination  the  most  intelligent 
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subjects  had  the  lowest  thresholds.  It  was  concluded  there- 
from that  all  that  was  necessary  was  to  measure  the  sub- 
ject's power  of  sensory  discrimination,  such  as  the  sensi- 
tivity to  differences  in  pitch,  or  the  ability  to  distinguish 
between  the  length  of  two  lines,  in  order  to  determine 
the  degree  of  intelligence  of  an  individual.  It  was  thought 
also  that  tests  in  memory  and  association  would  give  results 
indicative  of  the  degree  of  intelligence.  Further  investiga- 
tion, however,  proved  that  these  assumptions  were  not 
true.  It  happens  very  frequently  that  an  exceptionally 
good  memory  is  possessed  by  one  with  low  intelligence. 

From  this  humble  beginning  there  has  been  organized 
a  vast  body  of  methods  and  results  requiring  several  hun- 
dred standard  experiments  and  a  great  army  of  experts 
who  spend  most  of  their  time  in  the  laboratory  employing 
these  new  methods  of  observation  and  introspection. 

Why  Work  in  Psychological  Measurements  Was  Re- 
tarded.— The  intrinsic  difficulty  of  psychological  measure- 
ments is  undoubtedly  largely  responsible  for  their  tardy 
advent  and  lack  of  progress.  The  recesses  of  the  mind 
were  apparently  inaccessible.  But  there  are  other  reasons 
why  progress  was  not  made  in  this  science. 

Descartes  had  drawn  a  sharp  line  of  demarcation 
in  popular  thought  between  the  natural  and  the  men- 
tal sciences.  The  former  were  considered  quantitative, 
and  hence  measureable,  the  latter,  qualitative,  and  not 
measureable.  This  popular  ' '  common-sense ' '  point  of  view 
has  had  the  weight  and  inertia  of  a  settled  tradition. 
Immanuel  Kant  (1724-1804)  reinforced  the  ''common- 
sense"  view  from  the  fields  of  philosophy.  He  declared, 
in  1786,  that  psychology  never  could  obtain  the  rank  of 
pure  science.  To  overcome  the  effects  of  a  dogmatic  state- 
ment of  a  philosopher  like  Kant  was  a  difficult  thing  to 
do.  Of  course,  there  were  plenty  of  ideas  and  suggestions 
as  to  what  might  be  done.    The  path  of  scientific  progress 
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is  littered  with  brilliant  suggestions.  It  is  so  easy  to 
suggest  but  so  difficult  to  grasp  the  suggestion  and  carry- 
it  to  its  conclusion. 

Not  only  did  mental  measurements  get  a  late  start  but 
their  very  nature  makes  progress  slow.  Like  political 
constitutions,  new  branches  of  knowledge  are  not  made 
but  must  grow.  Improvements  will  be  slow  and  discus- 
sion is  one  of  the  means  by  which  they  are  made. 

Effects  of  Wundtian  Laboratories. — Just  as  the  Binet 
tests  have  occupied  the  center  of  the  stage  in  the  field  of 
intelligence  measurements,  so  Wundt  and  Wundtian 
laboratories  have  been  the  center  of  the  movements  in 
experimental  psychology  for  the  last  fifty  years.  Many 
of  Wundt 's  students,  Titchener,  Hall,  Kraepelin,  Miiller, 
Cattell,  Meumann,  and  others,  have  gone  far  beyond  him 
in  many  lines.  Dr.  Stanley  Hall  founded  the  first 
psychological  laboratory  in  the  United  States  at  Johns 
Hopkins  University  in  the  eighties.  Within  a  few  years 
Wundtian  laboratories  in  experimental  psychology  were 
established  at  Philadelphia  and  at  Columbia  by  Cattell; 
at  Cornell  by  Titchener ;  and  at  Harvard  by  Miinsterburg. 

Beginning  in  the  small  and  more  accessible  fields  of 
sensation,  experimental  methods  gradually  moved  up 
toward  the  more  complex  problems  of  association,  atten- 
tion, voluntary  movements,  wiU,  the  higher  thought  proc- 
esses, and  even  feelings.  But  little  has  been  done,  how- 
ever, in  the  last-named  field.  Of  all  mental  processes,  the 
feelings,  the  affective  states,  are  the  most  baffling  and  least 
amenable  to  measurements. 

As  early  as  1912  fifteen  of  the  more  progressive  asylums 
were  applying  Wundtian  methods  to  some  extent  in  the 
study  of  insanity.  Wundt  realized,  as  few  scientists  did, 
that  progress  in  a  science  is  bound  up  with  progress  in 
t-he  methods  of  investigation.  Every  new  instrument  used 
is  followed  by  a  series  of  new  discoveries,  and  modern 
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science  itself  originated  in  a  revolution  of  methods  in  the 
hands  of  Bacon,  Galileo,  and  others. 

Psychology'-  made  little  progress  from  the  time  of  Aris- 
totle to  Wundt  because  there  is  no  scientific  field  so 
crowded  with  presuppositions  and  prejudices  and  so 
barren  of  scientific  methods  of  investigation.  It  is  the 
spirit  of  Wundt  more  than  anything  else,  perhaps,  that 
has  led  to  the  recent  improvement  in  controlled  observa- 
tional methods  in  the  study  of  animal  instinct,  from 
tropisms  up  to  tests  of  intelligence. 

The  methods  and  tools  developed  and  employed  in  the 
psychological  laboratory  were  soon  to  fall  into  the  hands 
of  those  interested  in  mental  development  not  only  from 
the  standpoint  of  applied  science,  but  also  with  the  idea 
of  helping  some  of  the  more  unfortunate  ones  in  our 
society.  Men  motivated  by  altruistic  and  philanthropic 
ideas  sought  to  better  mankind  by  the  application  of 
science  in  the  field  of  intelligence.  It  was  the  psychiatrists 
dealing  with  abnormal  adults  who  first  wanted  to  test 
intelligence.  The  movement  was  at  first  wholly  within 
the  field  of  psychopathology.  These  men  devised  tests 
and  systems  of  tests.  By  far  the  greater  portion  of  these 
tests  took  on  the  character  of  questions  and  qualitative 
tests  rather  than  that  of  quantitatively  gradable  tests. 
Their  methods  were  open  to  many  criticisms.  The  method 
of  determining  intelligence  was  to  test  a  few  individuals 
a  great  many  times  with  one  or  more  tests  and  then  apply 
the  tests  to  a  great  many  individuals  only  once,  or  rarely 
more  than  two  or  three  times.  If  it  was  found,  on  the 
whole,  that  persons  known  to  possess  more  than  average 
intelligence  obtained  better  averages  than  the  less  intelli- 
gent, it  was  assumed  that  the  methods  employed  would 
answer  for  the  testing  of  intelligence.  Most  of  the  early 
testing  was  of  this  sort.  Their  interest  was  primarily  with 
abnormal  individuals.     Whatever  testing  the  psychiatrist 
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did  on  normal  individuals  was  simply  to  establish  norms 
for  the  measurement  of  abnormal  adults.  Whatever  data 
were  gathered  in  reference  to  normal  individuals  came 
about  as  a  "  by-product ' '  of  this  process.  They  knew  little 
about  standards  for  normal  adults  with  which  the  per- 
formances of  abnormal  subjects  were  to  be  compared,  and 
they  knew  absolutely  nothing  about  standards  for  children, 
What  is  more,  one  normal  standard  is  not  enough.  With 
children  every  age-level  must  have  its  own  standard.  The 
magnitude  of  a  defect  in  intelligence  in  a  nine-year-old 
child  can  be  determined  only  by  comparing  it  with  a 
normal  nine-year-old  intelligence. 

In  the  purely  psychological  fields  men  were  studying 
the  psychology  of  testimony,  optical  illusions,  association 
experiments,  tachistoscopic  experiments,  the  learning  of 
nonsense  syllables,  etc.  The  first  tests  made  by  psycholo- 
gists were  not  designed  as  measures  of  intelligence.  They 
seem  to  have  arisen  as  a  direct  result  of  the  individual 
differences  noted  in  the  laboratory  by  the  experimental 
psychologists.  At  first  these  individual  differences  were 
a  distinct  hindrance  because  they  made  the  establishment 
of  psychological  laws  difficult.  But  psychologists  became 
interested  in  them  for  their  own  sake,  and  once  this  oc- 
curred we  had  the  birth  of  the  test  designed  to  measure  the 
mental  differences  between  individuals.  The  first  tests  were 
concerned  with  measurements  of  specific  "faculties"  or  ca- 
pacities. They  were  tests  of  different  mental  processes  or 
of  different  states  of  consciousness.  At  first  the  tendency 
was  to  ignore  individual  difference  of  pupils.  Now  the 
demand  is  to  individualize  instruction  which  will  accen- 
tuate these  differences.  The  opinion  is  not  unanimous 
that  differences  should  be  accentuated,  but  the  tendency 
towards  individual  instruction  will  undoubtedly  bring 
about  this  result. 

Many  persons  were  using  these  individual  experiments 
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as  tests  of  intelligence.  The  chief  hindrance  both  to  the 
psychiatrist  and  to  the  psychologist  in  attempting  to 
measure  intelligence  up  to  this  time  was,  perhaps,  a  lack 
of  definite  working  hypotheses  as  to  what  intelligence  really 
is.  The  ordinary  school  examinations  would  give  them  a 
notion  of  the  pupil's  knowledge  and  of  his  external 
accomplishments,  but  they  do  not  afford  an  index  to  his 
inner  endowment,  his  mental  maturity  and  power.  What 
this  endowment  is,  and  a  means  for  measuring  it,  were 
obviously  the  next  steps  in  the  scientific  procedure  of  men- 
tal measurements.  The  more  or  less  blind  probing  with 
poor  methods  for  an  unknown  characteristic  brought  poor 
results.  They  realized  that  tests  must  be  selected  in 
accordance  with  rather  definite  hypotheses  of  the  exact 
nature  of  intelligence.  Therefore  definite  hypotheses  of 
just  what  intelligence  is,  with  better  and  more  scientific 
measuring  instruments,  were  the  crying  needs  of  the  hour. 
It  is  here  that  work  began  in  earnest  on  the  measurement  of 
intelligence. 

What  Is  General  Intelligence? — The  following  defini- 
tions of  intelligence  can,  of  course,  be  nothing  more  than 
working  hypotheses  in  this  field.  We  shall  note  a  number 
of  definitions  of  this  kind.  Stern"  defines  intelligence  as 
**a  general  capacity  of  an  individual  consciously  to  adjust 
Tiis  thinking  to  new  requirements:  it  is  general  mental 
adaptability  to  new  problems  and  conditions  of  life."  He 
thinks  this  definition  clearly  differentiates  intelligence 
from  other  mental  capacities  such  as  memory,  genius, 
talent,  etc.  Adjusting  one's  thinking  to  new  requirements 
obliterates  the  effect  of  memory;  forcing  the  examinee  to 
adapt  himself  to  the  performance  of  problems  already  set 
differentiates , intelligence  from  genius,  the  nature  of  which 


^  William  Stern,  The  Psychological  Methods  of  Testing  Intelligence, 
translated  by  Guy  Montrose  Whipple  (Warwick  &  York,  1914),  p.  3. 


64  EDUCATIONAL  MEASUREMENT 

is  to  create  the  new  spontaneously.  The  fact  that  it  is  a 
general  capacity  distinguishes  intelligence  from  talent,  the 
chief  characteristic  of  which  is  the  limitation  of  efficiency 
in  one  kind  of  content.^ 

Binet's  conception  of  intelligence  emphasizes  three  char- 
acteristics of  the  thought  process:  (1)  its  tendency  to 
lake  and  maintain  a  definite  direction;  (2)  the  capacity 
to  make  adaptations  for  the  purpose  of  attaining  a  desired 
end;  and  (3)  the  power  of  auto-criticism.*  In  his  earlier 
work  Binet  believed  that  the  essence  of  intelligence  was 
capacity  to  adjust  the  attention.^ 

Meumann's  conception  of  intelligence  is  not  quite  clear. 
At  times  he  lays  great  stress  on  the  understanding  of  the 
abstract  as  the  root  of  intelligence.  He  makes  use  of  the 
retentive  powers  of  memory  in  the  learning  of  abstract 
words;  he  holds  that  the  power  of  independent  and  crea- 
tive elaboration  of  new  products  out  of  the  material  given 
by  memory  and  the  senses  is  a  manifestation  of  intelligence. 
In  practical  affairs  intelligence,  according  to  Meumann, 
means  the  ability  to  avoid  errors,  to  adjust  one's  self  to 
his  environment,  and  to  surmount  difficulties.  He  makes 
extensive  use  of  what  is  known  as  the  '^Masselon  experi- 
ment," which  gives  the  subject  a  number  of  words  that 
are  to  be  used  in  a  sentence.  He  thinks  this  is  a  reliable 
index  of  the  maturity  of  the  associative  processes. 

According  to  Ebbinghaus,  the  essence  of  intelligence  lies 
in  comprehending  together,  in  a  unitary  meaningful  whole, 
impressions  and  associations  that  are  more  or  less  inde- 
pendent, heterogeneous,  or  even  partly  contradictory. 
"Intellectual  ability  consists  in  the  elaboration  of  a  whole 
into  its  worth  and  meaning  by  means  of  many-sided  com- 


3  Ibid.,  pp.  3-4. 

''  "L' intelligence  des  imbeciles,"  U Annie  Psychologiqtie,   1909,   pp. 
1-147. 

^  Stern,  op.  cit.,  p.  17. 
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hination,  correction  and  completion  of  numerous  kindred 
associations."  ^  He  thinks  that  every  true  instance  of  intel- 
lectual ability  may,  in  the  last  analysis,  be  reduced  to  an 
act  of  combining.  It  is  a  combination  activity.  To  test 
the  maturity  of  this  combining  activity,  he  gives  the  sub- 
ject sentences  broken  up  into  parts  with  gaps  in  them, 
words  left  out,  and  he  asks  the  subject  to  supply  the  miss- 
ing parts  so  as  to  make  the  sentences  read  correctly.  The 
same  principle  was  used  by  Healy  in  the  Picture  Com- 
pletion Tests.    It  is  also  used  in  many  other  experiments. 

Zeihen  attempts  to  measure  intelligence  by  the  use  of 
tests  of  retention,  development,  comprehension,  and  gen- 
eralization. 

Is  Intellig-ence  a  General  Faculty  of  the  Mind? — At 
the  present  time,  the  term  general  intelligence  is  commonly 
understood  to  mean  an  innate  ability  or  group  of  abilities 
that  lie  at  the  basis  of  the  acquired  intelligence  of  an 
individual.  We  know  that  intelligence  itself  is  not  in- 
born, but  only  the  capacity  to  become  intelligent.  Whether 
general  intelligence  signifies  a  single  inborn  capacity  which 
functions  in  all  situations,  or  a  large  number  of  specific 
capacities,  more  or  less  related,  which  enable  an  individual 
to  acquire  intelligent  behavior  in  many  different  activities, 
is  a  question  that  has  not  been  settled  by  psychologists. 
Spearman,  Hart,  and  Burt  explain  innate  intelligence  as 
a  "general  common  factor."  Spearman  has  developed  a 
mathematical  formula  which  shows  the  correlation  between 
the  various  faculties  of  the  mind,  and  hints  at  a  general 
common  factor  in  all  mental  performances,  which  is  known 
popularly  as  ''general  intelligence." 

Burt,  employing  in  his  investigations  the  methods  of 
Spearman,  concludes  that  there  is  a  general  function — 


^  Cf.  Lewis  M.  Terman,  The  Measurement  of  Intelligence  (Houghton 
Mifflin  Co.,  1916),  p.  46. 
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a  greatest  common  measure — permeating,  to  a  greater  or 
less  extent,  the  various  special  functions  measured  by  his 
tests.  He  thinks  that  the  measurements  obtained  are 
measurements,  more  or  less  indirect,  of  a  single  capacity, 
and  not  determined  purely  by  different  capacities  in  dif- 
ferent cases;  that  the  idea  of  an  all-around  mental  effi- 
ciency applicable  in  many  directions  is  a  legitimate 
conception.'' 

Thorndike  does  not  endorse  the  idea  of  the  existence  of 
the  common  factor  denoted  by  the  term  ''general  intel- 
ligence."   Relative  to  this  point  he  says:^ 

This  doctrine  requires  not  only  that  all  branches  of  intellectual 
activity  be  positively  correlated,  which  is  substantially  true,  but 
also  that  they  be  bound  to  each  other  in  all  cases  by  one  common 
factor,  which  is  false.  The  latter  would  require  that  no  two 
intellectual  abilities  or  branches  of  intellectual  activity  should 
be  more  closely  related  to  each  other  than  to  the  fundamental 
function  by  which  alone  they  are  supposed  to  be  related.  .  .  . 
But  unless  one  arbitrarily  limits  the  meaning  of  "all  branches 
of  intellectual  activity"  so  as  to  exclude  a  majority  of  those  so 
far  tested,  one  finds  traits  closely  related  to  each  other  but  with 
their  common  element  only  loosely  related  to  the  common  element 
of  some  other  pair.  .  .  .  The  mind  must  be  regarded  not  as  a 
functional  unit,  nor  even  as  a  collection  of  a  few  general  facul- 
ties which  work  irrespective  of  particular  material,  but  rather  as 
a  multitude  of  functions  each  of  which  involves  content  as  well 
as  form  and  so  is  related  closely  to  only  a  few  of  its  fellows, 
to  the  others  with  greater  and  greater  degrees  of  remoteness. 

Meumann  and  others  criticize  the  idea  of  a  general  faculty 
known  as  general  intelligence  and  cite  many  reasons  why 
this  hypothesis  is  contrary  to  the  facts.  The  arguments 
are  too  long  to  persent  here,  and  suffice  it  to  say  that  most 
psychologists  deny  the  existence  of  this  general  faculty. 


7  Child-Stvdy,  Vol.  4,  pp.  97-101. 
^Educational  Psychology,  Vol.  Ill,  pp.  364-366. 
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We  might  go  on  at  length  giving  definitions  of  intelli- 
gence, which  are,  at  best,  nothing  more  than  working 
hypotheses;  nevertheless,  such  hypotheses  serve  as  guides 
in  our  attempt  to  measure  an  inheritance  which  conditions 
all  our  mental  achievements. 

Inability  to  Define  Intelligence  Accurately  Does  Not 
Prohibit  Measurements. — It  may  seem  at  first  thought 
that  it  would  be  impossible  to  measure  a  thing  that  has 
not  been  defined.  This,  however,  is  not  the  case.  Stern 
points  out  that  electromotive  force  was  measured  long  be- 
fore electric  currents  were  well  understood;  that  many 
diseases  were  diagnosed  and  successfully  treated  when  very 
little  was  known  of  their  real  causes.  The  whole  science 
of  chemistry,  for  instance,  is  built  up  on  the  supposition 
that  matter  is  composed  of  molecules  and  atoms,  and  the 
definitions  given  for  them  are  at  best  but  working  hypo- 
theses. So  the  handicap  of  being  unable  clearly  to  define 
intelligence  may  not  be  as  great  a  hindrance  as  it  might 
seem.  The  above  definitions  of  intelligence  differ  more  be- 
cause of  different  points  of  view  than  because  men  do 
not  agree  as  to  what  intelligence  is.  The  tests  that  we  shall 
now  describe  attempt  to  approach  this  capacity  from  many 
angles;  hence  the  difference  in  the  tests. 

Since  the  Binet  tests  are  by  far  the  most  important, 
both  as  to  origin  and  as  to  the  fundamental  principles 
used,  a  short  historical  sketch  of  their  development  will 
throw  much  light  on  attempts  at  measuring  intelligence. 
After  this  brief  historical  statement,  we  shall  again  refer 
to  the  nature  of  intelligence  in  the  discussion  of  the  types 
of  tests  used  to  measure  it.  Then  we  shall  give  some  of 
the  most  modern  conceptions  of  it  as  discussed  by  writers 
in  this  field. 

The  Binet  Tests. — The  Minister  of  Education  in  France 
in  1904  decided  to  separate  the  subnormal  from  the  normal 
children  in  the  public  schools  of  that  nation.     For  this 
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difficult  task  he  called  upon  Alfred  Binet  to  devise  a  series 
of  tests  which  might  be  used  for  that  purpose.  Binet  had 
had  a  wide  experience  with  the  education  of  children.  He 
had  been  president  of  the  Societe  Libre  pour  V Etude  de 
r Enfant  for  a  number  of  years.  At  the  suggestion  of 
many  teachers  he  had  organized  a  committee  for  the  care 
of  abnormal  children  which  initiated  various  investigations 
relative  to  backward  children.  He  had  worked  with  chil- 
dren for  many  years  studying  their  peculiarities  and 
proclivities. 

Binet  consented  to  undertake  the  difficult  task  and,  call- 
ing to  his  assistance  the  physician,  Thomas  Simon,  devised 
a  series  of  30  tests,  the  chief  purpose  of  which  was  to 
detect  suhnormality.  After  trying  these  tests  on  203  school 
children  in  Paris,  both  Binet  and  Simon  came  to  the  con- 
clusion that  it  was  possible  to  devise  a  series  of  tests  that 
would  not  only  detect  suhnormality  but  also  serve  as  a 
definite  measure  of  mental  unfoldment.  With  this  thought 
in  mind,  they  devised  a  scale  consisting  of  54  tests  which 
they  published  in  1908.  This  scale  was  revised  and  re- 
published in  1911. 

Before  taking  up  the  problems  confronting  Binet  and 
Simon  in  making  an  intelligence  scale,  a  tabular  s\Tiopsis 
of  the  1911  Revision  as  adapted  to  American  conditions 
will  be  presented  for  purposes  of  reference.  A  general 
description  of  the  scale  will  then  follow  with  an  interpreta- 
tion of  a  number  of  the  descriptive  terms  used. 

Tabular  Synopsis  of  the  Binet-Simon  Scale,  1911  Edition 

Age  3: 

1.  Points  to  nose,  eyes',  and  mouth. 

2.  Repeats  two  digits. 

3.  Enumerates  objects  in  a  picture. 

4.  Gives  family  name. 

5.  Repeats  a  sentence  of  six  syllables. 
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Age  4: 

1.  Gives  his  sex. 

2.  Names  key,  knife,  and  penny. 

3.  Repeats  three  digits, 

4.  Compares  two  lines. 

Age  5: 

1.  Compares  two  weights. 

2.  Copies  a  square. 

3.  Repeats  a  sentence  of  ten  syllables. 

4.  Counts  four  pennies. 

5.  Unites  the  halves  of  a  divided  rectangle. 

Age  6: 

1.  DistingTiishes  between  morning  and  afternoon. 

2.  Defines  familiar  words  in  terms  of  use. 

3.  Copies  a  diamond. 

4.  Counts  thirteen  pennies. 

5.  Distinguishes  pictures  of  ugly  and  pretty  faces. 

Age  7: 

1.  Shows  right  hand  and  left  ear. 

2.  Describes  a  picture. 

3.  Executes  three  commissions,  given  simultaneously. 

4.  Counts  the  value  of  six  sous,  three  of  which  are  double. 

5.  Names  four  cardinal  colors. 
Age  8: 

1.  Compares  two  objects  from  memory. 

2.  Counts  from  20  to  0. 

3.  Notes  omissions  from  pictures. 

4.  Gives  day  and  date. 

5.  Repeats  five  digits. 
Age  9: 

1.  Gives  change  from  twenty  sous. 

2.  Defines  familiar  words  in  terms  superior  to  use. 

3.  Recognizes  all  the  pieces  of  money. 

4.  Names  the  months  of  the  year,  in  order. 

5.  Answers  easy  "comprehension  questions." 
Age  10: 

1.  Arranges  five  blocks  in  order  of  weight. 

2.  Copies  drawings  from  memory. 

3.  Criticizes  absurd  statements. 

4.  Answers  diflicult  "comprehension  questions." 

5.  Uses  three  given  words  in  not  more  than  two  sentences. 
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Age  12: 

1.  Resists  suggestion. 

2.  Composes  one  sentence  containing  three  given  words. 

3.  Names  sixty  words  in  three  minutes. 

4.  Defines  certain  abstract  words. 

5.  Discovers  the  sense  of  a  disarranged  sentence. 

Age  15: 

1.  Repeats  seven  digits. 

2.  Finds  three  rhymes  for  a  given  word. 

3.  Repeats  a  sentence  of  twenty-six  syllables. 

4.  Interprets  pictures. 

5.  Interprets  given  facts. 

Adult  : 

1.  Solves  the  paper-cutting  test. 

2.  Rearranges   a  triangle  in  imagination. 

3.  Gives  differences  between  pairs  of  abstract  terms. 

4.  Gives  three  differences  between  a  president  and  a  king. 

5.  Gives  the  main  thought  of  a  selection  which  he  has  heard 

read. 

The  Binet-Simon  scale  is  an  instrument  for  measuring 
mental  maturity.  By  maturity  we  mean  the  development 
of  native  capacity  as  a  whole  by  growth,  training,  and 
environment.  Mental  growth  is  a  gradual  increase  of 
capacity  for  learning  which  comes  as  a  result  of  the  devel- 
opment of  the  nervous  system  apart  from  all  training.^ 
By  native  capacity  or  endowment  we  mean  the  special 
capacity  for  functioning  with  which  nature  has  provided 
the  individual.  Inborn  capacity  manifests  itself  only 
through  learning.  An  individual  born  with  a  great 
capacity  to  become  intelligent,  but  denied  the  opportunity 
to  learn,  would  possess  no  intelligence.  Intelligence  must 
be  acquired. 

The  tests  in  the  Binet  scale  are  arranged  in  progressive 
steps  of  increasing  difficulty,  each  higher  step  involving 


'  Leta  S.  Hollingworth,  The  Psychology  of  Subnormal  Children  (The 
Macmillan  Co.,  1920),  p.  97. 
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the  more  tardily  appearing  functions,  such  as  reasoning, 
complex  comparisons,  the  associative  functions,  and  the 
like,  which  depend  directly  on  the  maturation  of  the  native 
capacities.  The  authors  of  the  scale  worked  on  the  assump- 
tion that  the  intellectual  ability  of  children  of  a  given  age 
tended  to  approach  a  relatively  well-marked  norm.  The 
tests  in  each  age-group  were  selected  on  this  basis.  The 
mental  scale  is  merely  the  grouping  together  of  individual 
tests  in  order  to  give  a  more  general  picture  of  the  mental 
make-up  of  the  individual.  Binet  originated  the  idea  of 
grouping  tests  for  estimating  intelligence.  For  a  long  time 
he  had  been  interested  in  the  question  of  tests  for  various 
specific  abilities.  His  work  gradually  led  him  to  a  study 
of  individual  cases,  and,  in  summing  up  the  psychological 
characteristics  of  individuals,  as  revealed  by  the  mental 
tests,  he  came  upon  the  idea  of  using  a  number  of  tests 
as  a  measure  of  the  individual's  capacity.  In  addition  to 
this,  his  theoretical  speculations  as  to  what  the  tests  were 
testing,  led  him  to  the  conclusion  that  "attention"  and 
"adaptation"  were  at  bottom  the  chief  factors  that  dis- 
tinguished the  intelligent  from  the  unintelligent.  The 
practical  situation  presented  to  Binet  of  separating  the 
normal  from  the  subnormal  children  of  France  called  forth 
the  first  actual  group  of  tests  for  differentiating  intelligent 
and  unintelligent  children.  He  was  called  upon  to  dis- 
criminate between  the  normal  and  backward  child,  and 
the  question  was  not  whether  this  or  that  child  was  better 
in  such  a  specific  thing  as  memory  or  imagination,  but 
whether  the  child  was,  in  general,  weaker  in  his  intellectual 
endowment  than  the  average  child  of  his  age.  He  there- 
fore discarded  the  individual  tests  for  specific  ability  and 
took  a  group  of  tests  which  seemed  to  cover  in  general 
the  chief  psychological  characteristics  that  go  to  make  up 
intelligence.  It  was  Binet,  therefore,  who  really  blazed 
the  trail  through  the  jungle  of  mental  measurements  and 
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left  us  a  path  which  leads  to  the  general  abilities  of  a 
child's  mental  life.  As  the  norm  or  standard  of  intelligence, 
he  took  what  the  average  child  at  each  age  could  do. 

These  two  points,  the  use  of  the  group  tests  and  the  aver- 
age performance  at  each  age  as  a  standard  of  measurement, 
form  the  basic  principles  upon  which  all  of  our  measuring 
scales  of  intelligence  now  rest. 

The  Binet  Tests  Had  Many  Innovations. — Binet  at- 
tempted to  measure  the  higher  thought  processes  instead  of 
the  simpler  ones  such  as  sensory  discrimination,  reaction 
time,  retentiveness,  and  the  like.  He  abandoned  the  older 
''faculty  psychology"  which  had  given  direction  to  most 
of  the  testing  up  to  this  time  and  set  problems  for 
the  reasoning  powers — problems  that  provoke  judgment 
about  abstract  matters  and  problems  that  draw  on  the 
discriminating  powers  of  the  examinee.  If  the  faculties 
of  the  mind  were  separate  and  distinct  so  that  they  might 
be  measured  singly  and  then  summated  to  get  an  idea  of 
general  intelligence,  the  problem  might  be  simplified.  But 
they  are  so  interwoven  and  intertwined  that  they  cannot 
be  isolated  for  measurement.  Memory,  for  instance,  cannot 
be  tested  separate  and  apart  from  attention  and  other 
faculties  because  of  the  interfunctioning  of  these  faculties. 
Of  course,  most  mental  phenomena  elude  absolute  measure- 
ments in  terms  of  amount ;  but  they  can  be  measured  in 
terms  of  their  relative  magnitude.  Although  we  cannot 
equate  them,  we  can  subject  them  to  quantitative  treat- 
ment. 

The  Constituent  Functions  of  Intelligence  Must  Be 
Broug^ht  into  Play. — Just  as  the  total  character  of  an 
individual  cannot  be  determined  by  judging  a  single  char- 
acteristic of  his  behavior,  so  his  general  intelligence  cannot 
be  determined  by  measuring  the  strength  of  one  phase  of 
his  mental  life.  We  must  test  many  processes  before  a 
comprehensive  idea  of  an  individual's  general  intelligence 
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can  be  gained.  For  instance,  if  a  completion  test  is  used, 
would  one  be  safe  in  saying  that,  because  the  child  has 
the  power  to  fill  blanks  in  which  words  had  been  left  out 
of  a  sentence  or  paragraph,  he  has  a  high  degree  of  in- 
telligence, whereas  a  child  who  cannot  do  this  does  not 
possess  intelligence  to  the  same  degree?  Or  would  a  child's 
ability  to  repeat  a  number  of  words  or  figures  be  an  in- 
fallible index  of  his  general  intelligence? 

General  intelligence  is  not  simply  the  functioning  of 
individual  processes  as  such.  It  depends  on  their  correla- 
tion and  interfunctioning.  While  there  was  much  criti- 
cism of  the  single  tests  and  systems  of  tests  used  by  the 
psychiatrist,  which  tested  only  one  phase  of  the  mind  and 
attempted  to  judge  the  whole  from  this  single  character- 
istic, there  is  also  much  criticism  of  the  Binet  tests  on 
the  same  general  principles.  This  will  be  discussed  more 
fully  under  the  criticisms  of  the  Binet  scale. 

Binet  attempted  to  make  a  scale  that  would  give  a  kind 
of  composite  picture  of  the  individual's  mentality  by  test- 
ing the  strength  of  a  limited  number  of  these  processes. 
The  critics  say  that  the  scale  is  much  better  than  the  single 
tests  of  the  psychiatrist  but  that  it  still  fails  to  test  enough 
characteristics  to  make  the  picture  anything  like  complete. 
They  say  that  because  of  the  limited  number  of  character- 
istics tested  a  lack  of  development  of  one  trait,  or  an  over- 
development of  another,  tends  to  give  a  general  result  not 
in  keeping  with  the  facts. 

The  degree  of  interfunctioning  between  the  various 
processes  may  be  illustrated  as  follows :  If  a  child 's  atten- 
tion is  nil,  or  nearly  so,  then  his  perceptive  power  does 
not  focus  long  enough  to  produce  the  cortical  alterations 
giving  memory.  Therefore,  attempts  at  measuring  memory 
also  measure  percex)tion  and  other  abilities  of  the  mind. 
In  this  way  a  clue  to  the  interfunctioning  and  interde- 
pendence of  the  various  processes  is  obtained. 
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The  Kind  of  Mental  Functions  Brought  into  Play.— 
Binet  differed  from  the  psychiatrist  and  other  psychologists 
in  the  type  of  tests  he  designed.  His  tests  were  designed 
to  test  the  higher  and  more  complex  thought  processes 
such  as  reasoning  power,  abstract  judgments,  and  the  like, 
instead  of  attempting  to  measure  sensory  discrimination, 
the  rapidity  of  reaction,  and  other  lower  and  less  complex 
powers.  Up  to  this  time  it  was  considered  impossible  to 
measure  these  complex  processes.  The  old  "faculty 
psychology"  had  given  direction  to  the  earlier  testing.  It 
seemed  to  both  the  psychiatrist  and  the  psychologist  that 
sensory  discrimination,  memory,  attention,  and  other 
processes  of  the  mind,  could  be  measured  better  if  taken 
separately  than  if  an  attempt  were  made  at  summating 
the  general  results  of  all  the  different  aspects  of  intelli- 
gence. This  might  be  true  were  it  not  for  the  innerfunc- 
tioning  of  the  various  aspects  of  the  mind  which  makes 
it  impossible  to  completely  isolate  any  one  process  for 
examination.  Binet  realized  this  fact  and,  instead  of  at- 
tempting to  measure  one  aspect  of  the  mind,  he  undertook 
to  ascertain  the  general  level  of  intelligence. 

It  is  now  generally  conceded  that  such  elementary 
processes  as  sensory  discrimination  (like  distinguishing  be- 
tween two  shades  of  color),  reaction-time  (as  the  rapidity 
witli  which  an  individual  can  tap),  and  visual  acuity  have 
but  little  to  do  with  the  higher  thought  processes,  since 
many  feeble-minded  children  have  keen  powers  in  sensory 
discrimination.  It  is  apparently  the  power  of  comprehen- 
sion, abstraction,  and  the  ability  to  direct  thought  that 
separates  the  normal  from  the  subnormal.  Hence  the  other 
tests  were  very  largely  discarded  by  Binet.  He  sought  to 
bring  into  play  only  those  mental  processes  thought  to  be 
so  closely  concerned  with  ratv  native  ahility  that  they 
would  give  him  an  insight  into  this  native  endowment.  If 
the  percentages  of  passes  did  not  increase  in  going  from 
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younger  children  to  older  ones,  the  test  was  considered  unfit 
since  it  did  not  indicate  the  degree  of  maturity  of  the 
developing  intelligence.  Or  again,  if  children  known  to 
be  bright  passed  a  certain  test  more  frequently  than  chil- 
dren known  to  be  dull,  such  a  test  was  considered  satis- 
factory. 

Establishment  of  the  Zone  of  Normality. — Binet  was 
confronted  with  a  practical  problem,  that  of  separating  the 
subnormal  from  the  normal  children  in  France.  It  is  ob- 
vious that  he  could  not  detect  the  subnormal  children  unless 
he  knew  what  was  meant  by  normal  ones.  He  had  no 
precedent  to  follow  in  this  difficult  task.  No  one  had  deter- 
mined what  mental  equipment  a  child  must  have  to  be 
considered  normal.  It  was  not  sufficient  to  determine  the 
mental  equipment  of  a  group  of  children  of  one  age  and 
take  that  as  the  standard,  because  the  standard  for  each 
age  had  to  be  established  if  the  degree  of  maturity  of  the 
child's  native  mental  endowment  was  to  be  determined. 
It  was  therefore  necessary  to  take  a  group  of  children  for 
each  age  and  determine  normality  for  that  age. 

The  idea  of  the  age-grade  method  for  measuring  intelli- 
gence did  not  come  to  Binet  until  he  had  experimented 
with  tests  for  fifteen  years.  The  provisional  scale  published 
in  1905  did  not  employ  the  age-grade  method  but  con- 
sisted of  32  tests  roughly  arranged  in  the  order  of  their 
difficulty.  Since  no  account  is  given  as  to  how  he  came 
to  employ  the  age-grade  method,  the  supposition  is  that 
in  working  with  the  data  gathered  from  the  provisional 
scale  made  in  1905  he  hit  upon  the  age-grade  idea.  Suffice 
it  to  say  that  the  age-grade  idea  was  relatively  complete 
in  his  1908  scale. 

Three  problems  confronted  him  in  this  task:  (1)  He 
must  arbitrarily  or  otherwise  choose  for  each  age  a  group 
of  children  that  he  considered  normal.  (2)  He  must 
determine  the  general  intelligence  of  each  of  these  groups. 
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(3)  He  must  determine  the  boundary  lines  which  sep- 
arate the  zone  of  normality  from  the  supernormal  on  the 
one  hand  and  from  the  subnormal  on  the  other. 

In  the  solution  of  the  first  problem,  if  he  were  to  choose 
the  children  from  the  higher  classes  in  France  his  standard 
would  be  too  high,  assuming  that  the  children  from  the 
upper  classes  had  a  larger  stock  of  native  mental  endow- 
ment than  those  from  the  lower  classes.  On  the  other  hand, 
if  the  children  selected  were  known  to  be  mtellectually 
inferior,  his  scale  would  not  be  a  fair  measure  of  what  a 
child's  mental  equipment  really  is.  He  apparently  made 
his  selection  on  the  assumption  that  the  normal  child  is 
the  so-called  average  child;  the  common,  ordinary  child; 
the  child  who  has  that  mental  equipment  possessed  by  the 
greatest  number  of  children  of  that  particular  age.  With 
the  help  of  some  of  the  teachers  he  then  chose  a  group  of 
children  for  each  age  which  he  thought  represented  the 
average  child.  His  next  problem  was  to  determine  the 
general  intelligence  of  the  groups  chosen.  This  was  done 
by  a  series  of  tests.  He  had  tests  of  varjdng  difficulty 
which  he  gave  to  each  group.  He  found  the  number  of 
children  in  each  group  who  were  able  to  pass  the  tests  and 
thus,  tentatively  at  least,  determined  the  amount  and  de- 
gree of  maturity  of  what  he  considered  the  native  mental 
endowment  of  normal  children.  His  method  of  determin- 
ing what  tests  should  belong  to  a  particular  age-level  will 
be  discussed  later. 

In  order  to  further  clarify  the  exact  problems  Binet  had 
before  him  a  diagrammatic  representation  of  the  range  of 
mentality  is  given  in  Fig.  I.  In  the  great  range  of 
mentality  from  the  idiot  on  the  one  hand  to  the  genius 
on  tlie  other  there  seems  to  be  a  gradual  increase  of  in- 
telligence, and  somewhere  near  the  middle  of  this  range 
is  a  section  which  may  be  designated  as  the  zone  of 
normality. 
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We  may  represent  the  range  and  distribntion  of  men- 
tality in  Figure  I  thus:  Let  the  left  end  of  the  line  AB 
represent  the  lowest  stages  of  mentality  and  the  right  end 
the  highest.  The  curve  is  known  as  a  normal  prohahility 
curve,  normal  frequency  curve,  or  normal  distribution 
curve.  It  will  be  described  more  fully  in  a  subsequent 
chapter.  The  number  of  cases  having  the  various  degrees 
of  mentality  is  represented  by  the  height  of  the  curve  above 
the  base  line  AB.  Thus,  at  the  left  the  small  number  of 
people  with  extremely  low  mentality  is  represented  by  the 

M 


ILLtTSTRATING    THE    DISTRIBUTION    OF    MENTAL   ABILITY  AC- 
CORDING TO  THE  NORMAL  PROBABILITY  CURVE 

line  CD.  As  the  mental  powers  increase,  the  number  of 
cases  also  increases  until  it  reaches  a  maximum  at  point  M. 
That  is,  a  line  drawn  through  point  M  perpendicular  to 
the  line  AB  is  the  longest  line  that  can  be  drawn  within 
the  figure,  and  thus  represents  the  degree  of  intelligence 
possessed  by  the  largest  group  of  people.  Not  only  is  it 
theoretically  true  that  the  distribution  of  intelligence  is 
according  to  the  normal  frequency  curve,  but  an  actual 
count  taken  of  any  considerable  number  of  unselected 
people  shows  that  the  distribution  conforms  remarkably 
well  to  the  normal  distribution  curve.    Thus  the  figures  of 
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the  Royal  Commission  of  Great  Britain,  using  the  social- 
economic  criterion  of  mental  deficiency  as  a  measure,  show 
the  distribution  at  the  low  end  of  the  curve  for  certain 
districts  surveyed  to  be:  585  idiots,  1,007  imbeciles,  and 
9,828  feeble-minded  (morons)^"  This,  of  course,  is  not  an 
unselected  group,  but  it  does  show  the  distribution  among 
people  whose  mentality  is  below  normal.  The  number  of 
cases  above  normal  then  decreases  until  at  the  extreme 
right  the  curve  comes  down  to  the  base  line  AB,  or  nearly 
so,  because  there  are  few  extremely  bright  or  intelligent 
people. 

Near  the  middle  of  these  two  extremes  along  the  line 
AB  is  a  zone  which  we  may  call  the  zone  of  normality,  the 
width  of  which  may  be  represented  by  the  line  FP.  The 
positions  of  the  points  F  and  P  are  arbitrarily  chosen ;  hence 
the  width  of  the  zone  of  normality  will  vary  according  to 
the  judgment  of  the  scientist.  It  is  important  to  note  that 
normality  does  not  mean  a  point  on  the  scale  but  the  dis- 
tance between  two  points  arbitrarily  chosen.  Reasonable 
latitude  must  be  allowed  for  the  pendulum  of  intelligence 
to  swing  a  minor  arc  to  the  right  or  to  the  left  and  still 
remain  within  the  zone  of  normality. 

The  third  problem  mentioned  above,  therefore,  is  settled 
in  an  arbitrary  way.  The  zone  of  normality  will  vary  in 
width  according  to  the  scientist,  and  no  sharp  line  of 
demarcation  wiU  separate  the  normal  from  the  subnormal. 

Haberman  defines  a  normal  individual  as  "one  whose 
reaction  to  given  stimuli  is  no  more  or  less  in  degree  and 
manner  than  a  certain  quantum  that  we  have  become  accus- 
tomed to, ' ' "  This  definition  apparently  conforms  to 
Binet's  conception  of  a  normal  individual. 

^^  The  usual  method  of  classifying  people  below  the  normal  is  to 
call  those  belonging  to  the  lowest  stage  of  mentaUty,  idiots,  and,  in 
ascending  order,  imbeciles,  feeble-minded,  retarded,  and  normal. 

^*  J.  Victor  Haberman,  The  Intelligence  Examination  and  Valviation, 
p.  2. 
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Criteria  for  Separating  the  Normal  from  the  Sub- 
normal.— There  are  various  ways  of  separating  the 
normal  from  the  subnormal.  They  are  sometimes  desig- 
nated as:  (1)  the  social-economic  criterion;  (2)  the  peda- 
gogical criterion;  (3)  the  medical  criterion;  and  (4)  the 
psychological  criterion}^ 

The  definition  of  a  feeble-minded  person  given  by  the 
Royal  Commission  of  Great  Britain,  which  investigated  the 
question  of  mental  deficiency  in  1904,  illustrates  the  social- 
economic  criterion  for  measuring  intelligence.  A  feeble- 
minded person  was  defined  as:  "one  who  is  capable  of 
earning  a  living  under  favorable  circumstances,  but  is  in- 
capable, from  mental  defect  existing  from  birth,  or  from 
an  early  age,  (a)  of  competing  on  equal  terms  with  his 
normal  fellows ;  or  ( 6 )  of  managing  himself  and  his  aif airs 
with  ordinary  prudence." 

The  "jokers"  in  a  definition  of  this  kind  are,  of  course, 
the  expressions,  "competing  on  equal  terms"  and  "or- 
dinary prudence."  These  allow  a  great  deal  of  freedom 
in  defining  a  feeble-minded  individual. 

The  pedfigogical  criterion  has  been  used  for  a  long  time 
as  a  basis  for  separating  normal  from  subnormal  children, 
the  usual  custom  being  to  call  children  feeble-minded  who 
are  retarded  pedagogically  three  years  or  more.  This 
method  is  obviously  defective  because  of  the  many  things 
that  might  cause  a  child  to  be  retarded  three  or  more  years 
and  still  have  a  normal  mind  or  one  almost  normal.  Sick- 
ness, a  late  start  in  school,  bad  eyesight,  and  poor  economic 
conditions  illustrate  the  point. 

The  medical  criterion  is  based  on  the  assumption  that 
mental  deficiency  is  analogous  to  physical  disease.  Binet 
has  tersely  stated  the  chief  objections  to  the  medical 
criterion  as  follows : 

^*  Leta  S.  HoUingworth,  op.  cit.,  p.  43. 
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Each  one  according  to  his  own  fancy  fixes  the  boundary  line 
separating  these  states.  It  is  in  regard  to  the  facts  that  the 
doctors  disagree.  In  looking  closely,  one  can  see  that  the  con- 
fusion comes  principally  from  the  fault  in  the  method  of  examina- 
tion. When  an  alienist  finds  himself  in  the  presence  of  a  child 
of  inferior  intelligence,  he  does  not  examine  him  by  bringing 
out  each  of  the  symptoms  which  the  child  manifests,  and  by 
interpreting  all  symptoms  and  classifying  them;  he  contents 
himself  with  taking  a  subjective  impression  as  a  whole  of  his 
subject,  and  of  making  his  diagnosis  by  instinct.  We  do  not 
think  we  are  going  to  far  in  saying  that  at  the  present  time  very 
few  physicians  would  be  able  to  cite  with  absolute  precision  the 
objective  and  invariable  sign  or  signs  by  which  they  distinguish 
the  degrees  of  inferior  mentality. 

The  physician  has  been  trained  to  diagnose  and  treat 
physical  disorders  primarily,  and  when  confronted  with  a 
mental  condition  reasons  very  largely  from  analogy  of  what 
he  knows  of  physical  states.^^ 

It  is,  of  course,  the  psychological  criterion  in  which  we 
are  primarily  interested.  Each  of  the  other  attempts  at 
classifying  individuals  has  depended  so  much  upon  the 
judgment  of  those  making  the  classification  that  it  is  far 
less  definite  than  it  should  be  for  practical  use. 

A  knowledge  of  the  distribution  of  intelligence  among 
the  people  of  a  nation  is  of  great  importance  both 
theoretically  and  practically.  It  is  worth  a  great  deal  to 
know  what  percentage  of  the  people  may  fall  within  the 
range  of  normality  and  to  find  out  what  positions  on  the 
scale  for  testing  mental  development  are  symptomatic  of 
social  deficiency.  It  is  also  important,  from  a  sociological 
standpoint,  to  know  that  adults  testing  below  a  certain 
score  are  so  low  in  intellectual  development  that  it  is  a 
question  as  to  whether  they  have  sufficient  equipment  to 
survive  socially.  Adults  who  test  only  ten  years  old  men- 
tally, for  instance,  are  an  uncertain  group  in  intellectual 

"  Ibid.,  pp.  47-48.  ~~ 


THE  MEASUREMENT  OF  INTELLIGENCE      81 

ability  with  the  probability  that  they  will  require  more 
or  less  social  care,  while  those  who  test  only  nine  years 
old  are  deficient  enough  to  need  continuous  care.  Goddard 
found  no  ease  at  the  Vineland  School  for  the  Feeble-Minded 
which  tested  higher  than  12  mentally.  Huey  found  but 
two  such  cases  in  the  State  Asylum  at  Lincoln,  111.,  and 
Kuhlmann  found  only  ten  at  the  Minnesota  State  School 
for  the  Feeble-Minded." 

In  the  distribution  of  intelligence  Terman  found  that 
about  60  per  cent  of  all  school  children  test  between  90 
and  110  I.  Q.  (''I.  Q."  means  intelligence  quotient  and  is 
the  quotient  found  by  multiplying  the  mental  age  by  100 
and  dividing  by  the  chronological  age) .  About  40  per  cent 
test  between  95  and  105.  An  intelligence  quotient  of  110 
to  120  is  five  times  as  frequent  among  children  of  superior 
social  status  as  among  those  of  inferior  social  standing; 
the  proportion  among  the  superior  social  group  being  24 
per  cent  of  all,  whereas  but  5  per  cent  of  those  that  belong 
to  the  inferior  social  group  test  with  an  I.  Q.  of  from 
110-120.  Not  more  than  three  children  out  of  100  score 
as  high  as  125  I.  Q.  and  only  one  in  100  as  high  as  130, 
while  an  I.  Q.  of  140  is  made  by  only  one  in  250  to  300.^^ 
Terman  makes  the  following  classification  of  intelligence 
quotients  (from  the  Stanford  Revision)  :^^ 

I.  Q.  Classification 

Above  140 "Near"  genius  or  genius. 

120-140      Very  superior  intelligence. 

110-120      Superior  intelligence. 

90-110      Normal,  or  average,  intelligence. 

80-  90      Dullness,  rarely  classifiable  as  feeble-mindedness. 

70-  80      Borderline  deficiency,  sometimes  classifiable  as 

dullness,  often  as  feeble-mindedness". 
Below  70 Definite  feeble-mindedness. 

"  Cf.  James  B.  Minor,  Deficiency  and  Delinquency,  p.  95. 
1^  Terman,  op.  cit.,  pp.  94-96. 
^^  Ibid.,  p.  79. 
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Peeble-minded  individuals  with  intelligence  quotients  be- 
tween 50  and  70  include  most  of  the  morons ;  those  between 
20  and  50,  imbeciles ;  and  those  below  20,  idiots. 

Are  Differences  in  Intelligence  of  Degree  or  of  Kind? — 
It  makes  a  great  deal  of  difference  to  those  attempting  to 
measure  intelligence  whether  the  differences  found  among 
individuals  as  to  their  intelligence  is  a  difference  in  degree 
or  a  difference  in  the  kind  of  mental  traits  and  character- 
istics individuals  have. 

If  the  difference  is  one  of  kind  then  we  might  expect  the 
normal  individual  to  have  certain  mental  traits,  powers, 
and  characteristics  not  possessed  by  the  subnormal  in- 
dividual. On  the  other  hand,  if  it  is  simply  a  difference 
in  degree,  then  the  idiot  possesses  some  of  all  the  mental 
traits,  powers,  and  characteristics  that  the  normal  in- 
dividual or  even  the  genius  has.  And  such  would  now 
seem  to  be  the  consensus  of  opinion,  for  a  search  for  a 
qualitative  difference  between  feeble-minded  and  normal 
individuals  has  failed  to  disclose  any  characteristic  or  men- 
tal trait  found  in  the  normal  individual  and  not  possessed 
by  the  feeble-minded  to  at  least  a  small  degree. 

In  his  early  work  even  Binet  considered  the  difference 
between  normal  individuals  and  subnormal  ones  to  be  one 
of  kind.  In  his  work  entitled  Mentally  Defective  Children 
he  says : 

A  second  and  totally  different  theory  is  tenable,  and  this  one 
appears  to  us  to  be  much  nearer  the  truth.  It  is  that  a  defective 
child  does  not  resemble  in  any  way  a  normal  one  whose  develop- 
ment has  been  retarded  or  arrested.  He  is  inferior  not  in  degree, 
but  in  kind.  ...  An  unequal  and  imperfect  development  is  his 
special  characteristic.  These  inequalities  of  development  may  vary 
to  any  degree  in  different  subjects.  They  always  produce  a  want 
of  equilibrium,  and  this  want  is  the  diiferentiating  attribute  of 
the  defective  child. 

In  Binet 's  later  works  he  came  to  the  conclusion  that 
the  difference  between  the  normal  and  subnormal  was  not 
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one  of  kind  but  one  of  degree.  In  his  volume  entitled  TJie 
Intelligence  of  the  Feeble-Minded  he  writes:  "We  may 
thus  pass  in  review  all  our  faculties,  and  determine  that 
not  one  is  entirely  lacking  in  them.  .  .  .  They  always  have 
them  in  some  degree.  .  .  .  The  arsenal  of  their  intellect  is 
equipped  with  all  the  weapons." 

Dr.  Norsworthy  published  the  results  of  her  experiments 
in  measuring  feeble-minded  children  in  1906.  She  tested 
a  variety  of  their  mental  traits,  and  also  the  same  traits 
of  a  number  of  normal  school  children.  In  no  case  did 
she  find  the  normal  child  having  mental  traits  not  possessed 
by  the  feeble-minded  child.  The  experiments  of  Pearson 
and  Jaedorholm  coincide  with  the  findings  of  Norsworthy 
and  also  the  later  conclusions  of  Binet.  The  conclusions 
of  these  scientists  are  further  confirmed  by  statistics.  The 
subnormal  children  occupy  the  lower  end  of  the  normal 
distribution  curve  and  the  greater  the  degree  of  subnor- 
mality  the  fewer  the  cases  found.  If  they  were  in  a  class 
by  themselves,  the  number  of  cases  would  be  less  near  the 
limits  of  the  class  and  greatest  near  the  middle.  But,  as 
was  stated  above,  the  number  grows  progressively  greater 
from  the  lowest  mentality  to  the  normal  individual. 

Choosmg  Tests  to  Measure  Intelligence. — The  selection 
of  tests  that  will  measure  general  intelligence  is  an  ex- 
tremely difficult  problem  because  of  the  complex  and  subtle 
character  of  the  mind  to  be  measured.  Many  psychological 
principles  must  be  kept  in  mind.  Just  any  kind  of  tests 
will  by  no  means  satisfy  the  conditions.  We  indicated 
earlier  in  the  chapter  that  the  type  of  test  chosen  depended 
on  what  was  conceived  to  be  the  nature  of  intelligence.  In 
the  earlier  testing  work,  for  instance,  Binet  believed  that 
the  essence  of  intelligence  was  the  capacity  to  adjust  the 
attention.  He  therefore  devised  tests  such  as  the  cancella- 
tion of  letters  in  a  specially  prepared  sheet,  so  as  to  bring 
into  play  this  faculty.    He  thought  that  the  ability  to  dis- 
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criminate  between  two  near-lying  points  of  the  compass  on 
the  skin  was  a  matter  of  attention  rather  than  sensation 
and  used  this  as  a  test  in  the  earlier  part  of  his  work.  We 
shall  enumerate  some  of  the  problems  to  be  solved  in  the 
selection  of  tests  and  criticize  briefly  their  attempted  solu- 
tions. 

Tests  Must  Not  Be  Influenced  by  External  and  Chance 
Conditions. — If  the  tests  were  affected  to  any  great 
degree  by  school  training  or  environmental  conditions  they 
would  not  be  a  measure  of  native  endowment.  Measuring 
general  intelligence  is  not  merely  measuring  what  a  child 
knows  or  has  retained;  it  is  not  a  matter  of  knowledge 
but  of  something  essentially  dynamic  behind  and  veritably 
in  this  knowledge.  It  is  interfunctioning,  harmonious  and 
correlated  in  the  normal,  but  discordant  and  disrelated  in 
the  psychopathic,  the  intellectually  defective. 

Ayres^^  criticizes  the  Binet  tests  on  the  ground  that  five 
of  them  depend  upon  the  child's  recent  environmental  ex- 
perience. It  may  be,  however,  that  his  criticisms  are  not 
just  because  the  degree  to  which  environment  is  able  to 
affect  a  child  may  reveal  his  native  endowment.  Environ- 
mental effects  are  determined  and  distinguished  in  various 
ways  from  innate  capacity.^^  One  method  is  the  effect  of 
practice  on  performance.  Other  things  being  equal,  the 
less  practice  required  to  make  a  given  unit  of  gain  the 
greater  the  initial  endowment.  Great  superiority  over  the 
average  in  intellectual  effort  indicates  a  high  degree  of 
native  endowment.  Also  the  appearance  of  definite  activi- 
ties, such  as  when  a  child  spontaneously  interests  himself 
in  music  or  literature,  indicates  that  the  child  possesses 
strong  innate  tendencies  in  those  directions.  When  prac- 
tice  in   varying   amounts   has   already   influenced   initial 

"  The  Bw-et-Simon  Measuring  Scale  of  Intelligence;  Some  Criticisms 
and  Suggestions. 

"  Cf.  Meumann,  Vorlesungen,  Vol.  II.,  pp.  306-314. 
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capacities,  the  cause  of  the  differences  in  individuals  can 
be  inferred  from  the  effects  of  equalizing  practice.^'' 
Further  practice  of  equal  amounts  will  reduce  differences 
where  these  are  not  due  to  innate  causes;  if,  however,  the 
inequalities  increase  with  further  practice,  they  are  not 
caused  by  irregularities  in  previous  practice  and  are  there- 
fore due  to  innate  conditions.^" 

Only  Those  Tests  Must  Be  Chosen  That  Afford  a 
Decided  and  Reliably  Symptomatic  Value,  General 
Applicability  and  Possibility  of  Objective  Evolution. — 
As  a  matter  of  fact,  a  scale  for  the  measurement  of  intel- 
ligence is  more  limited  in  scope  than  the  above  description 
would  suggest,  since  it  omits  a  great  many  capacities  or 
abilities  that  are  not  supposed  to  be  indicative  of  the  men- 
tality of  an  individual.  For  example,  there  are  tests  for 
the  ability  to  discriminate  two  points  on  the  skin  and  for 
the  ability  to  discriminate  different  shades  of  color ;  but  we 
do  not  include  these  tests  in  our  scales  of  intelligence,  be- 
cause it  is  not  believed  at  the  present  time  that  such  tests 
have  diagnostic  value  for  distinguishing  between  different 
grades  of  intelligence. 

Tests  Must  Not  Depend  Too  Much  on  the  Ability 
to  Use  Language. — Just  how  much  the  ability  to  use 
language  is  indicative  of  intelligence  is  a  question.  Have 
we  a  valid  test  when  the  ability  to  pass  it  depends  not 
merely  on  the  comprehension  of  language  but  also  upon 
the  ability  to  frame  the  adequate  language  response?  If 
one  depends  on  definitions  too  exclusively,  he  may  be 
measuring  school  achievements  rather  than  raw  intelli- 
gence. It  is  evident  that  it  would  limit  and  perhaps 
vitiate  the  results  a  great  deal  if  too  much  emphasis  were 
put  on  the  language  factor. 

^^  Edward  L.  Thorndike,  Educational  Psychology,  Vol.  Ill,  p.  305. 
'"  Robert  R.  Rusk,  Experimental  Ediication  (Longmans,  Green  &  Co., 
1919),  p.  162. 


86  EDUCATIONAL  MEASUREMENT 

This  language  difficulty  inherent  in  the  Binet  Scale  and 
in  all  the  revisions  of  it  became  very  pronounced  as  soon 
as  the  use  of  the  scale  spread  to  workers  in  the  various 
fields.  The  clinical  psychologists  in  the  large  cities  were 
face  to  face  with  the  problem  of  the  foreign  child,  the 
speech-defective,  and  the  deaf  children.  It  was  obvious 
that  the  Binet  Scale  was  not  adequate  for  the  mental 
examination  of  such  cases.  Other  tests  not  involving 
language  were  introduced,  and  these  gave  rise  to  the  type 
now  generally  known  as  performance  tests,  which  will  be 
described  later. 

The  essential  characteristic  of  this  type  of  test  is  that 
it  shall  not  require  any  kind  of  language  response  on  the 
part  of  the  child  for  an  adequate  performance  of  the  test. 
It  is  true  that  seven  of  the  Binet  tests  depend  on  a  child's 
ability  to  read  and  write,  and  many  others  require  him 
to  express  himself  through  language.  The  degree  to  which 
this  scale  is  vitiated  because  of  its  dependence  on  language 
has  not  been  determined.  It  has  been  determined,  however, 
that  there  is  a  high  correlation  between  the  ability  to  use 
language  and  general  intelligence. 

Determining  the  Age  to  Which  a  Test  Should  Be 
Assigned. — One  of  the  difficult  problems  in  devising  a 
mental  scale  has  been  the  determination  of  the  particular 
age  to  which  the  parts  should  be  assigned.  Suppose,  for 
instance,  we  take  one  of  the  questions  Binet  assigned  to  the 
five-year-old  group.  The  examiner  repeats  a  sentence  with 
ten  syllables  in  it  and  asks  the  examinee  to  repeat  it  after 
him.  Suppose  15  per  cent  of  the  children  three  years  old 
can  do  it  successfully,  40  per  cent  of  the  four-year-old 
children,  75  per  cent  of  the  five-year-olds  and  100  per  cent 
of  the  six-year-olds.  To  what  grade  should  this  sentence 
be  assigned?  Shall  we  assign  it  to  the  six-year-old  group 
because  this  was  the  lowest  group  that  was  able  to  make 
a  score  of  100  per  cent?    Or,  if  75  per  cent  of  them  are 
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able  to  do  it,  is  that  sufficient  ?  Or,  if  any  percentage  less 
than  100  is  used,  what  percentage  should  it  be  before  it  ia 
assigned  for  that  particular  age? 

In  the  standardization  of  individual  tests  the  usual  cus- 
tom has  been  to  consider  a  test  properly  placed  if  it  is 
passed  by  75  per  cent  of  the  children.  The  idea  is  that 
if  the  test  is  properly  placed  it  will  be  passed  by  50  per 
cent  of  the  children  who  may  be  considered  of  average 
intelligence,  plus  25  per  cent  of  the  children  who  are  con- 
sidered above  the  average.^^  Of  course,  theoretically  at 
least,  25  per  cent  who  are  assumed  to  be  below  normal  will 
be  unable  to  pass  the  test. 

Binet's  guiding  principles  in  the  arrangement  of  tests 
were:  (1)  Find  an  arrangement  of  the  tests  which  would 
cause  the  average  child  of  any  given  age  to  test  "at  age"; 
that  is  the  average  six-year-old  must  show  a  mental  age 
of  six  years,  the  average  eight-year-old  a  mental  age  of 
eight,  and  so  on.  (2)  In  order  to  obtain  this  result  he 
found  that  it  was  necessary  to  locate  an  individual  test 
in  that  year  where  it  was  passed  by  about  two-thirds  to 
three-fourths  of  the  unselected  children. 

Terman^^  considers  the  proper  assembling  of  the  tests 
one  of  Binet  's  biggest  problems  and  one  in  which  he  failed 
in  many  instances,  since  many  of  the  tests  were  misplaced 
as  much  as  one  year  and  one  was  misplaced  six  years. 

There  are  many  objections  to  this  method  of  locating 
tests;  but  on  the  whole  it  seems  to  be  about  as  good  as  we 
are  able  to  get  at  the  present  time. 

Problems  in  Scoring. — There  are  certain  problems  in- 
cident to  scoring  the  tests  which  deserve  special  treatment. 
We  shall  state  briefly  what  these  problems  are  and  note 
Binet's  solution  of  them. 

21  S.  D.  Porteus,  Condensed  Guide  to  the  Binet  Tests  (published  by 
the  Training  School,  Vineland,  N.  J.,  April,  1920),  Part  I,  p.  10 . 
^^Terman,  op.  cit.,  p.  48. 
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TJie  All-or-None  Method  in  Scoring. — The  question  here 
is  as  to  whether  or  not  credit  should  be  given  for  a  test 
done  only  in  part.  For  instance :  A  child  is  asked  to 
count  backwards  from  20  to  0.  Suppose  he  makes  three 
errors  but  counts  the  rest  correctly.  Should  any  credit 
be  given  for  such  work?  Binet  said  no.  He  held  that 
a  child  must  complete  each  test  before  any  credit  is  given. 
This  method  of  scoring  has  been  severely  criticized  by 
many  writers,  notably  Yerkes,  Bridges,  and  Hardwick.^' 
The  merits  of  the  all-or-none  method  of  scoring  will  be 
discussed  more  fully  under  A  Point  Scale  for  Measuring 
Mental  Ability  in  the  next  chapter. 

S'fiall  a  Child  Be  Required  to  Pass  All  the  Tests  at 
Each  Age-Level? — As  was  indicated  above,  each  age-level 
had  from  five  to  seven  tests.  The  question  arises  as  to 
whether  or  not  it  is  necessary  for  a  child  to  pass  all  the 
tests  at  each  age-level  before  he  shall  be  allowed  to  try 
those  of  a  higher  age-level.  Binet  realized  that  children 
had  lapses  in  attention  and  that  all  the  processes  of  the 
mind  did  not  develop  with  the  same  rapidity.  Therefore, 
in  order  to  take  cognizance  of  these  characteristics,  he  ar- 
bitrarily said  that  if  a  child  passes  all  the  tests  in  any 
age-level  save  one,  he  shall  be  considered  to  belong  to  that 
age-level.  For  instance,  at  the  five-year  level  there  are 
five  tests.  A  child  passing  four  of  them  would  be  con- 
sidered five  years  old  mentally.  But  suppose  a  child  should 
pass  all  the  tests  in  the  five-year  level,  and  two  in  the 
six-year  level,  how  would  we  reckon  his  age  ?  Binet  solved 
this  problem  as  follows :  He  took  as  a  basis  the  highest  age 
at  which  the  child  made  a  perfect  score;  that  is,  the  age 
at  which  he  passed  all  the  tests  or  all  save  one.  Then  for 
every  five  tests  he  passed  beyond  this  age,  one  more  year 
was  added  to  his  mental  age. 

"'  A  Point  Scale  for  Measuring  Mental  Ability  (Warwick  &  York, 
1915). 


i 
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It  sometimes  happens  that  a  child  will  pass  all  the  tests 
in  the  five-year  group,  for  instance,  fail  in  two  or  three 
of  them  in  the  six-year  group,  and  then  pass  all  in  the 
seven-year  group.  In  a  case  of  this  kind  the  problem  is. 
What  age-level  shall  be  taken  as  the  basis  for  computing 
mental  age?  Yerkes  and  Bridges  especially  call  attention 
to  this  point. 

With  What  Tests  Shall  the  Examination  Begin? — 
Suppose  one  were  called  upon  to  test  a  boy  eight  years 
old,  where  on  the  scale  should  he  begin?  Should  he  begin 
with  the  easiest  tests,  that  is,  the  tests  for  the  three-year- 
olds,  and  go  as  high  on  the  scale  as  the  boy  can  go,  or 
should  one  begin  with  some  other  age-level,  as  the  eight- 
year-old  level,  and  determine  the  boy's  mental  age  by 
going  up  if  the  boy  can  pass  tests  above  that  age,  or  down 
if  he  is  unable  to  pass  the  tests  at  this  age-level?  Binet 
solved  this  problem  by  the  trial-and-error  method.  Taking 
one  case  with  another,  his  best  guess  would  be  that  the 
child  is  normal  or  nearly  so.  Therefore  he  would  start 
enough  below  the  normal  age  to  find  an  age  group  he  was 
pretty  sure  the  child  could  pass  and  proceed  with  the  tests 
until  the  child  was  unable  to  pass  any  tests  of  a  higher 
age-level. 

In  making  these  tests  it  was  assumed  that  a  certain  men- 
tal level  normally  goes  with  a  certain  chronological  age, 
so  that  the  relation  of  mental  age  to  chronological  age 
indicates  the  amount  of  discrepancy  between  the  amount  of 
intelligence  present  and  that  required  for  normality.  The 
custom  has  been  to  compute,  in  a  simple  way,  the  difference 
between  the  two  ages,  which,  when  negative,  gave  the  ab- 
solute mental  retardation  and,  when  positive,  the  mental 
acceleration.  This  method  is  open  to  criticism,  however. 
It  has  been  shown  that  the  increments  from  age  to  age 
are  not  the  same  and  that  a  child  chronologically  ten  years 
old  and  retarded  two  years  does  not  have  the  same  retarda- 
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tion  as  a  child  chronologically  eight  years  old  and  retarded 
two  years. 

The  "At  Age"  and  the  Normal  Child. — Care  must  be 
taken  to  distinguish  between  the  "at  age"  child  and  the 
normal  child.  The  former  is  one  whose  mental  and 
chronological  ages  are  the  same.  But  a  child  may  be  con- 
sidered normal  if  he  is  mentally  a  few  months  younger  or 
older  than  the  "at  age"  child.  In  other  words,  the  "at 
age"  children  simply  constitute  the  middle  section  of  the 
zone  of  normality  although  children  advanced  or  retarded 
a  few  months  are  still  within  the  accepted  bounds  of 
normality. 

Binet  Tests  Give  More  Than  a  Composite  Picture. — 
"While  it  is  the  purpose  of  the  Binet  Scale  to  give  a  com- 
posite picture  of  a  child's  mentality,  it  nevertheless  gives 
valuable  insight  into  some  of  the  detailed  aspects  of  his 
mental  functioning.  Meumann  points  out  that  there  are 
three  distinct  types  of  tests:  (a)  tests  of  capacity  or  endow- 
ment, (&)  tests  of  maturity  or  development,  and  (c)  tests 
of  environment  or  training.  Bearing  in  mind  this  analysis 
of  the  scale  the  examiner  may  learn  not  only  that  the  child 
is  either  normal,  subnormal,  or  supernormal,  but  he  may 
also  learn  in  the  case  of  subnormality,  for  instance,  whether 
the  defect  is  due  to  training  or  to  native  capacity.  The 
giving  of  a  conventional  list  of  facts  displays  the  quality 
of  training,  for  instance,  whereas  the  repetition  of  auditory 
digits  or  sentences  displays,  inferentially,  at  least,  native 
capacity. 

Coefficient  of  Mental  Ag-e  and  the  Intelligence  Quotient. 
— Binet 's  method  of  expressing  the  relation  between  a 
child's  mental  and  chronological  ages  was  simply  to  say 
that  the  child  was  six  years  old  chronologically,  for  in- 
stance, and  six-and-a-half  years  old  mentally,  or  six  years 
old  chronologically  and  seven-and-a-half  years  old  men- 
tally, as  the  case  might  be.    Stern  suggested  a  new  method 
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of  evaluating  the  results  of  the  scale.  His  plan  was  to 
divide  the  number  of  tests  a  child  actually  passed  by  the 
number  he  ought  to  pass  and  call  the  quotient  the  coeffi- 
cient of  mental  age.  By  this  scheme  a  child  six  years  old 
chronologically,  for  instance,  should  pass  aU  the  tests  up 
to  and  including  those  of  the  sixth-year  group,  or  20  in 
all.  If  he  passed  but  15,  his  coefficient  of  mental  age 
would  be  0.75.  If,  however,  he  should  be  seven  years  old 
mentally,  his  coefficient  of  mental  age  would  be  1.17,  etc. 

Dr.  Terman  uses  the  intelligence  quotient,  sometimes 
spoken  of  as  the  I.  Q.,  for  the  purpose  of  expressing  the 
relation  between  the  chronological  and  mental  ages  of  chil- 
dren. It  is  found  by  dividing  the  mental  age  by  the 
chronological  age  just  as  Stern  suggested  in  determining 
the  coefficient  of  mental  age ;  but  instead  of  using  fractions 
to  represent  this  relation,  Terman  multiplied  the  quotient 
by  100  to  avoid  them.  Thus  the  "at  age"  child  by  the 
Terman  method  of  recording  age  is  given  an  I.  Q.  of  100, 
and  any  child  is  considered  normal  whose  I.  Q.  is  between 
90  and  110. 

Limitations  of  the  Tests. — The  Binet  Scale  does  not 
pretend  to  measure  the  entire  mentality  of  the  subject. 
The  emotions,  for  instance,  are  measured  only  in  the  re- 
motest way,  if  at  all.  It  is  not  claimed  that  the  scale 
reveals  special  talent,  and  for  this  reason  will  not  serve  as 
a  detailed  chart  for  the  vocational  guidance  of  children. 
It  will,  however,  roughly  hound  the  limits  within  which 
one's  intelligence  will  permit  success.  Sharp  lines  of  de- 
marcation are  not  to  be  expected  between  the  various  de- 
grees of  mentality. 

The  Ag-e  of  Mental  Maturity. — Like  other  body  tissues, 
the  nervous  system,  which  is  the  physiological  basis  of 
mental  life,  does  not  grow  indefinitely.  Since  psychologists 
cannot  directly  measure  the  course  of  growth  of  the  nervous 
system  in  living  children,  they  measure  it  indirectly  by 
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measuring  the  hehavior  of  the  child  in  taking  mental  tests. 
Scientists  are  not  agreed,  from  tests  thus  far  made,  as  to 
just  when  '^ native  intelligence,"  or  mental  growth,  has 
reached  maturity.  Practically  all  are  agreed  simply  that 
maturity  comes  somewhere  in  the  'teens  and  that  beyond 
this  there  is  little  or  no  development  of  this  native  capacity. 
Terman  says  :^* 

Native  intelligence,  in  so  far  as  it  can  be  measured  by  tests 
now  available,  appears  to  improve  but  little  after  the  age  of 
15  or  16  years.  It  follows  that  in  calculating  the  I.  Q.  (intelli- 
gence quotient)  of  an  adult  subject  it  will  be  necessary  to  dis- 
regard the  years  he  has  lived  beyond  the  point  where  intelligence 
attains  its  final  development.  Although  the  location  of  this  point 
is  not  exactly  known,  it  will  be  sufficiently  accurate  for  our  pur- 
pose to  assume  its  location  at  16  years".  Accordingly,  any  person 
over  16  years  of  age,  however  old,  is  for  purposes  of  calculating 
I.  Q.  considered  to  be  just  16  years  old.  If  a  youth  of  18  and 
a  man  of  60  years  both  have  a  mental  age  of  12  years  the  I.  Q. 
in  each  ease  is  12  -f- 16  or  0.75. 

Porteus  claims  that  the  recent  work  of  the  army  ex- 
aminers has  shovTn  that  the  age  of  maturity  is  much  below 
the  level  set  by  Terman  and  that  if  the  Terman  standards 
are  used,  it  is  possible  to  diagnose  an  adolescent  as  being 
at  the  feeble-minded  level  when  as  a  matter  of  fact  he  is 
comparatively  little  below  the  average  of  the  ordinary 
population, 

Yerkes  and  Bridges  say  in  connection  with  this  question 
that  "it  seems  highly  probable  that  the  adult  level  is  at- 
tained as  early  as  the  sixteenth  year."  ^^ 

Spearman  and  Kuhlmann  think  that  at  the  age  of  15 
tlic  native  intelligence  is  mature.  The  former  says  the 
fact  that  mental  ability  reaches  its  full  development  about 
the    period    of    puberty    is    still    further    evidenced    by 


''^Terman,  op.  cit.,  pp.  140-141. 

*^  A  Point  Scale  for  Measuring  Mental  Ability,  p.  64. 
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physiology.  For  the  human  brain  has  been  shown  to  attain 
its  maximum  weight  between  the  ages  of  10  and  15  years.^^ 
On  the  other  hand,  Wallin  thinks  we  should  have  more 
evidence  before  choosing  a  fixed  age  for  the  maturity  of 
native  intelligence.-'^ 

Criticisms  of  the  Binet  Tests. — The  criticisms  of  the 
Binet  tests  are  many  and  varied.  Some  are  favorable  and 
some  are  unfavorable.  "We  shall  note  first  some  of  the 
unfavorable  criticisms.  One  of  the  most  radical  and  severe 
criticisms  of  the  Binet  Scale  has  been  made  by  Haberman, 
who,  speaking  of  the  Binet  tests,  says:^^  "Like  the 
Freudian  delusion,  this  Binet  dilettanteism  has  taken  us 
by  storm  and  shows  splendidly  our  native  gullibility  and 
likewise  an  interesting  phase  of  the  hysterical  lay-medical 
activity  of  the  times." 

One  of  the  early  critics  of  the  Binet  Scale  was  Dr. 
Leonard  P.  Ayres.  He  criticized  the  tests  from  the  stand- 
point of  their  content.  His  criticisms  fall  under  six  heads : 
(1)  The  tests  depend  predominantly  on  the  child's  ability 
to  use  words  fluently,  and  only  to  a  limited  extent  to 
perform  acts.  (2)  Five  of  the  tests  depend  upon  the 
child's  recent  environmental  experience;  hence  it  is  ques- 
tionable whether  they  test  native  endowment.  (3)  Seven 
of  them  depend  upon  his  ability  to  read  and  write,  which 
again  raises  the  question  as  to  whether  they  test  native 
ability  or  school  achievement.  (4)  Too  great  weight  is 
given  to  tests  of  ability  to  repeat  words  and  numbers. 
(5)  Too  great  weight  is  given  to  ''puzzle  tests."  (6) 
Unreasonable  emphasis  is  given  to  tests  of  ability  to  de- 
fine abstract  terms.     Since  Ayres  made  his  criticisms  in 

*C.  Spearman,  "The  Heredity  oi  Ahihiy,"  Eugenics  Review,  1914, 
pp.  229-237. 

^  J.E.  Wallace  Wallin,  "Re-Averments  Respecting  Psychoclinical. 
Norms  and  Scales  of  Development,"  Psychological  Clinic,  1913, 
pp. 89-97. 

^Haberman,  op.  cit.,  p.  14. 
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1911  much  has  been  learned  about  the  Binet  tests,  and 
we  now  know  that  many  of  the  criticisms  made  by  him 
are  not  so  serious  as  they  looked  at  that  stage  of  their 
development. 

The  main  defects  of  the  Binet  Scale  according  to 
Meumann  are:  (1)  The  single  tests  are  not  rightly  graded 
according  to  their  difficulty.  (2)  The  tests  of  each  kind 
are  not  sufficiently  numerous,  and  those  of  a  particular 
kind  are  not  repeated  year  after  year  so  as  to  trace  the 
child's  development  in  the  various  faculties  of  the  mind. 
The  different  age-groups  deal  with  entirely  different  men- 
tal functions ;  for  instance,  one  of  the  tests  in  the  five-year 
group  is  to  repeat  a  sentence  of  ten  syllables.  This  is 
a  test  in  auditory  memory.  The  next  time  the  child  is 
called  upon  to  show  his  ability  in  auditory  memory  is 
when  he  is  asked  in  the  eight-year  group  to  repeat  five 
digits.  The  interval  is  three  years  and  cannot,  therefore, 
be  a  very  accurate  measure  of  a  child's  rote  memory 
development.  (3)  The  tests  determine  quite  different 
capabilities ;  that  is,  they  are  not  systematically  arranged 
according  to  definite  points  of  view.  (4)  The  exact  sum 
of  the  whole  testing  has  not  been  decided  upon. 

The  scale  is  further  criticized  on  the  ground  that  there 
are  two  tests  in  some  age-groups  which  depend  on  memory. 
This  gives  undue  emphasis  to  one  particular  trait  to  the 
exclusion  of  others,  which  may  be  as  much  of  a  measure 
of  mentality  as  memory. 

Limits  of  Traits  Not  Determined  by  the  Binet  Scale. — 
The  Binet  Scale  fails  to  determine  the  limits  of  a  trait 
for  two  reasons:  (1)  The  unit  of  accomplishment  is  so 
large  that  small  degrees  of  progress  are  not  recorded.  (2) 
In  one  age-group  auditory  memory,  for  instance,  may  be 
tested  whereas  in  the  next  group  a  different  kind  of  memory 
is  tested.  If  the  child  passes  the  test  in  auditory  memory 
he  is  not  tested  again  in  that  particular  trait  until  three 
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or  four  age-groups  are  passed.  Hence  his  ability  in  that 
particular  trait  is  not  determined  by  the  test  taken. 

Myers^^  points  out  that  the  determination  of  endowment 
and  the  measure  of  general  intelligence  has  been  ap- 
proached in  a  more  strictly  psychological  manner  by  tests 
different  from  the  Binet  tests.  In  his  opinion,  the  Binet 
tests  are  tests  of  production  rather  than  psychological  tests. 
He  says:  "They  determine  how  7nuch  an  individual  can 
work,  how  much  he  knows — not  how  he  works,  how  he 
knows.  A  man's  productivity,  of  course,  is  what  we  want 
to  ascertain  in  everyday  life.  We  do  not  care  how  a  man 
comes  to  use  or  acquire  his  powers ;  we  are  content  with  a 
mere  dynamometric  or  other  record  of  his  prowess.  But 
this  aspect  cannot  properly  be  called  the  psychological 
aspect. ' ' 

The  Binet  Scale  Criticized  on  Other  Points. — The  scale 
is  further  criticized  because  there  is  not  the  same  number 
of  tests  at  each  age-level.  The  four-year-old  group  con- 
tains but  four  tests  whereas  the  other  groups  contain 
five ;  and  it  may  be  easier  to  pass  three  tests  out  of  four 
than  four  out  of  five. 

The  absolute  amount  of  retardation,  that  is,  the  differ- 
ence between  the  mental  and  chronological  ages,  is  not  a 
comparable  quantity  when  different  age  levels  are  in 
question.  Or  again,  two  children  with  the  same  mental 
and  chronological  ages  may  have  a  very  wide  variation 
in  the  degrees  of  maturation  of  their  native  capacities. 
For  instance,  a  boy  ten  years  old  mentally  and  chronolog- 
ically may  have  passed  all  the  tests  consecutively  up  to 
and  including  the  ten-year  group  but  be  unable  to  go 
beyond  that  age-level.  Another  boy,  mentally  and 
chronologically  the  same  age  as  reckoned  by  the  Binet 
Scale,  may  have  been  unable  to  pass  all  the  tests  in  each 

»  British  Medical  Journal,  No.  2613,  pp.  196-197. 


96  EDUCATIONAL  MEASUREMENT 

age  group  beyond  the  eighth,  year,  but  because  he  was 
able  to  pass  ten  tests  above  the  eight-year  level  he  was 
given  a  mental  age  of  ten  the  same  as  the  other  boy.  The 
range  of  the  distribution  of  passes  and  failures  is,  as  a 
rule,  very  much  wider  in  the  feeble-minded  than  in  the 
normal  individual.  That  is,  a  subnormal  child  may  be 
very  proficient  in  one  trait  and  deficient  in  another.  Some 
investigators  have  found  it  to  be  twice  that  of  the  normal 
individual. 

A  mind  with  the  least  variation  from  the  normal  seems 
to  be  of  a  higher  type  than  one  which  varies  a  great  deal ; 
for  example,  a  child  who  passes  all  the  tests  in  the  eight- 
year  group  and  one  in  the  nine-year  group  and  can  go  no 
further  would  be  considered  of  a  higher  type  than  a  child 
who  passes  all  the  tests  in  the  seven-year  group,  two  in 
the  eighth,  and  one  in  each  of  the  next  three  groups,  mak- 
ing him  eight  years  old  mentally. 

Many  criticize  the  Binet  Scale  because  it  does  not  dis- 
play more  completely  the  character  of  the  mind  being 
tested.  Seashore,  Pyle,  and  others  demand  a  more  funda- 
mental analysis  and  a  more  exact  determination  of  mental 
ability  than  is  possible  to  obtain  by  the  Binet  tests.  Thus 
Seashore  writes:^" 

Retardation  does  not  follow  a  common  flat  level  any  more  than 
growth  does,  nor  even  nearly  so  much.  A  child  develops  one 
capacity  several  times  as  fast,  and  often  at  the  expense  of  another 
faculty.  This  differentiation  is  even  more  striking  in  retardation. 
What  is  more,  those  who  employ  the  tests  for  practical  purposes 
should  not  be  satisfied  with  a  flat  mental  age.  ...  In  a  study 
of  the  normal  individual  we  seek  to  discover  fortes  and  his 
faults,  in  short,  to  discover  his  particular  deviation  from  the 
norm  of  the  common  level.  There  is  no  reason  why  the  Binet- 
Simon  tests  should  not  develop  into  specific  measures  of  the 
relative  rank,  or  age,  of  more  specific  capacities  and  powers, 
such  as  reasoning  ability,  sensory  observation,  memory,  imagina- 

^  Journal  of  Educational  Psychology,  Vol.  3,  p.  50. 
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tion,  initiative,  emotional  life,  self-control,  etc.  A  child  may  be 
at  the  mental  age  of  six  in  one  capacity  and  twelve  in  another, 
and  the  important  thing  to  know  about  the  individual  is  this 
difference  and  direction  of  unsymmetrical  development.  It  may 
be  that  a  general  flat-age  test  must  be  retained  for  certain 
purposes,  but  even  that  must  be  interpreted  in  the  light  of 
measures  of  specific  capacities.  Only  by  extension  in  recognition 
of  this  principle  can  any  set  of  tests  be  of  permanent  value. 

Pyle  criticizes  the  tests  on  the  ground  that  there  is 
no  common  plan  running  through  the  tests  for  the  suc- 
cessive years.  He  argues  for  "a  series  of  tests  for  deter- 
mining the  degree  of  development  of  logical  memory, 
rote  memory,  attention,  imagination,  association,  and  tw^o 
aspects  of  mind  more  complex,  learning  capacity  and  rea- 
soning. ...  It  is  more  important,  it  seems  to  me,  to  knov/ 
specifically  the  condition  of  the  child  w^ith  reference  to 
the  development  of  the  separate  mental  traits  than  to 
know  his  average  performance  vi^ith  respect  to  them 
all.  "31 

Porteus^^  has  pointed  out  some  of  the  limitations  of  the 
Binet  tests  as  follows:  They  do  not  constitute  a  perfect 
instrument  of  research  and  have  failed  to  fulfill  many 
expectations  founded  on  them.  They  cannot  be  relied 
upon  for  accurate  diagnosis  of  the  highest  grades  of 
f eeble-mindedness ;  he  points  out  that  there  are  other  fac- 
tors besides  the  degree  of  intellectual  development  which 
have  a  bearing  on  social  competency.  Many  intellectually 
inferior  persons  are  judged  normal  by  social  criteria  be- 
cause they  possess  practical  abilities  not  evaluated  by 
Binet.  On  the  other  hand,  there  are  intellectually  normal 
individuals  who  are  socially  unfit  because  of  instability, 
weakness  of  temperament,  volition,  and  other  traits  not 
revealed  by  the  Binet  examination. 

"  Ibid.,  Vol.  3,  pp.  95-96. 

22  Condensed  Guide  to  the  Binet  Tests,  Bulletin  No.  19,  Training 
School  at  Vineland,  N.  J.,  pp.  1-3. 
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He  further  criticizes  the  tests  on  the  ground  that  they 
are  too  literary.  Fifty-seven  questions  out  of  74  require 
an  oral  language  response.  Language  development,  either 
from  the  standpoint  of  comprehension,  range  of  vocabu- 
lary, description,  or  defining  power,  is  the  main  capacity 
tested  in  50  per  cent  of  the  cases.  Thirty  per  cent  of  the 
tests  depend  on  previous  educational  training.  Forty- 
eight  per  cent  of  the  tests  depend  on  mediate  and  im- 
mediate memory. 

His  general  criticisms  are  that  the  tests  are  too  literary ; 
that  they  favor  the  glib-tongued,  quick-thinking  child 
who  has  had  good  educational  advantages,  who  memo- 
rizes easily  and  therefore  shows  good  scholastic  promise. 
This  is  why  the  Binet  tests  correlate  so  highly  with  school 
training.  It  has  been  mainly  through  comparison  with 
teachers'  judgments  that  the  validity  of  the  tests  has 
been  established.  There  are  cases,  however,  in  which 
children  with  high  Binet  records  show  little  power  of 
adaptation  to  school  conditions  and  also  cases  in  which 
the  class  dullard  "makes  good."  The  main  point  that 
Porteus  is  pointing  out  is  that  the  Binet  tests  will  not 
measure  accurately  all  individuals.  On  the  other  hand, 
it  is  not  necessary  to  stretch  out  one  series  of  tests  so 
thin  as  to  cover  the  whole  field  any^vay.  The  limitations 
of  the  tests  pointed  out  by  Porteus  does  not  mean  that 
the  Binet  tests  are  not  to  be  used  in  diagnosis  but  that 
they  must  not  be  used  to  the  exclusion  of  all  others  for 
diagnostic  work. 

These  defects  were  not  unknown  to  Binet.  In  regard 
to  mental  age  he  writes  thus:^^  "It  has  no  bearing  on 
the  cause  of  retardation,  nor  upon  its  peculiar  nature, 
nor  upon  the  means  of  rectifying  it."  In  regard  to  the 
fallacy  of  regarding  retardation  as  merely  equivalent  to 
a  lower  mental  age,  he  says:^* 

"  Mentally  Defective  Children,  English  trans.,  p.  12.       "  Ibid.,  p.  13. 
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A  defective  child  does  not  resemble  in  any  way  a  normal  one 
whose  development  has  been  retarded  or  arrested.  He  is  inferior 
not  in  degree  but  in  kind.  The  retardation  of  his  development 
has  not  been  unifonn.  Obstructed  in  one  direction,  his  develop- 
ment has  progressed  in  others.  To  some  extent  he  has  cultivated 
substitutes  for  what  is  lacking.  Consequently,  such  a  child  is 
not  strictly  comparable  to  a  normal  child  younger  than  himself. 

Some  Favorable  Criticisms  of  the  Binet  Scale. — In  spite 
of  its  defects  the  Binet  Scale  has  been  applied  in  prac- 
tically every  civilized  country  and  has  received  general 
approval.  Its  shortcomings  are  in  its  details  and  not  in 
the  fundamental  principles.  The  opinions  of  those  best 
qualified  to  sit  in  judgment  on  it  are  to  the  effect  that 
it  has  demonstrated  its  value  and  that  it  is  destined 
through  its  revisions  and  corrections  to  be  a  most  valuable 
instrument  in  measuring  of  mentality. 

Kuhlmann  writes  thus:^^ 

There  can  be  no  question  about  the  fact  that  the  Binet-Simon 
theses  do  not  make  half  so  frequent  or  half  as  great  errors  in  the 
mental  ages  [of  feeble-minded  children]  as  are  included  in  grad- 
ings  based  on  careful,  prolonged  general  observation  by  ex- 
perienced observers. 

Meumann  writes  as  follows:^® 

All  the  different  authors  who  have  made  these  researches  [with 
Binet's  method]  are  in  a  general  way  unanimous  in  recognizing 
that  the  principle  of  the  scale  is  extremely  fortunate,  and  all 
believe  that  it  offers  the  basis  of  a  most  useful  method  for  the 
examination  of  intelligence. 

Stern  says:^'' 

That,  despite  the  differences  in  race  and  language,  despite  the 
divergences  in  school  organization  and  in  methods  of  instruction, 

'5  Dr.  F.  Kuhlmann,  "The  Binet  and  Simon  Tests  of  Intelligence  in 
Grading  Feeble-Minded  Chadren, "  Journal  of  Ptiycho- Asthenics,  1912, 
p.  189. 

'^  Ernest  Meumann,  Experimentelle  Padagogik  (1913),  Vol.  II,  p.  277. 

^  Stern,  op.  cit.,  p.  49. 
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there  should  be  so  decided  agreement  in  the  reactions  of  children 
— is,  in  my  opinion,  the  best  vindication  of  the  principle  of  the 
tests  that  one  could  imagine,  because  this  agxeement  demonstrates 
that  the  tests  do  actually  reach  and  discover  the  general  develop- 
mental conditions  of  intelligence  (so  far  as  these  are  operative 
in  public-school  children  of  the  present  cultural  epoch),  and  not 
mere  fragments  of  knowledge  and  attainments  acquired  by  chance. 

Goddard  says:^^ 

It  is  without  doubt  the  most  satisfactory  and  accurate  method 
of  determining  a  child's  intelligence  that  we  have,  and  so  far 
superior  to  everything  else  which  has  been  proposed  that  as  yet 
there  is  nothing  else  to  be  considered. 

Goddard  not  only  defends  the  Binet  Scale  as  a  whole 
but  defends  the  age  grouping  on  the  ground  that  it  eon- 
forms  to  the  normal  distribution  curve.  He  says  that  if 
the  questions  were  not  properly  grouped,  age  for  age,  but 
were  too  hard  or  too  easy,  the  largest  group  would  not  be 
one  '*at  age"  but  would  be  a  year  below  or  a  year  above 
according  as  they  were  too  hard  or  too  easy. 

Other  Problems  That  Confront  Those  Testing  Intelli- 
gence.— A  number  of  other  interesting  and  perplexing 
problems  confront  those  attempting  to  measure  the 
growth  of  general  educational  capacity. 

1.  Does  the  defective  progress  normally  to  a  certain  point 
and  then  suffer  arrest,  or  has  mental  growth  been  retarded 
from  birth  f — The  study  of  subnormal  children  shows 
clearly  that  they  are  inferior  and  below  par  from  the  be- 
ginning. They  suffer  no  arrest  in  their  development  any 
more  than  a  normal  child  does.  They  simply  develop 
more  slowly  and  hence  never  reach  the  stage  normal 
children  reach  when  they  are  mature.     This  applies  not 


'''H.  H.  Goddard,  "The  Binet  Measuring  Scale  of  Intelligence. 
What  It  Is  and  How  It  Is  to  Be  Used,"  Training  School  Bulletin, 
Vineland,  N.  J.,  1912. 
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only  to  their  mental  development  but  also  to  their  physical 
development.  They  are  slower  in  learning  to  walk,  talk, 
creep,  sit  up,  the  appearance  of  their  teeth,  etc. 

2.  Does  the  defective  child  have  the  same  mental  equip- 
ment as  the  normal  child  of  the  same  mental  agef — Accord- 
ing to  the  best  evidence  that  can  be  secured  along  this 
line,  the  defective  child  seems  to  have  a  mental  equip- 
ment different  from  a  normal  child  of  the  same  mental 
age.  For  example,  a  child  twelve  years  old  chronolog- 
ically and  seven  years  old  mentally  differs  from  a  normal 
seven-year-old  child  in  certain  specific  habits  and  bits  of 
information.  The  defective  child  has  accumulated  certain 
of  these  habits  and  bits  of  knowledge  simply  because 
he  has  lived  longer  and  has  had  more  experience.  His 
native  endowment  is  more  nearly  matured  than  that  of 
the  normal  child.  But  maturity  in  the  defective  child 
carries  him  only  part  Avay  up  the  scale  and  it  is  in  this 
sense  that  he  is  mentally  equal  to  the  normal  child.  The 
instincts  differ,  especially  when  a  defective  adolescent  is 
compared  with  a  normal  child  of  the  same  mental  age. 

3.  Do  feeble-minded  children  mature  mentally  at  the 
same  chronological  age  as  normal  children? — Measurements 
taken  in  training  schools  show  that  defective  children 
continue  to  grow  mentally  until  well  advanced  in 
adolescence.  Their  development  is  slow,  and  there  is  rela- 
tively more  variation  in  the  development  of  the  traits 
than  in  normal  children^  but  it  may  be  said  to  be  con- 
tinuous. 

4.  Are  subnormal  children  equally  deficient  in  all  abili- 
ties f — Just  as  normal  children  develop  one  ability  more 
than  another,  so  subnormal  children  may  be  quite  profi- 
cient in  some  intellectual  traits  and  very  deficient  in 
others.  Visual  acuity  and  certain  types  of  memory,  for 
example,  may  be  as  well  developed  in  the  subnormal  child 
as  in  the  normal  and  in  some  cases  even  better.     Indeed, 
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it  is  this  marked  variation  of  mental  abilities  from  the 
norm  that  is  in  part  responsible  for  subnormality.  Not 
only  do  the  abilities,  taken  as  a  class,  develop  slowly  but 
very  irregularly.  His  several  intellectual  abilities  reach 
several  levels  on  the  intellectual  scale. 

Perhaps  enough  has  been  said  about  the  Binet  Scale 
and  the  many  problems  that  confront  those  attempting  to 
use  it.  We  have  given  a  brief  history  of  some  of  the 
more  important  problems  and  criticisms  of  the  Binet 
Scale.  All  are  agreed  that  it  is  far  from  a  perfect  measur- 
ing instrument.  It  has  been  open  to  many  criticisms.  Yet 
in  spite  of  the  criticisms,  the  scale,  nevertheless,  enables 
one  to  make  a  rough  classification  of  pupils  in  a  com- 
paratively short  time  and  by  rather  simple  means.  Its 
value  is  perhaps  practical  and  pedagogical  rather  than 
theoretical  and  psychological.  There  is  no  question  about 
its  being  considerably  in  advance  of  the  estimates  of  in- 
telligence based  on  school  performance  or  on  the  biased 
judgments  of  teachers  or  parents. 

A  summary  criticism  and  an  evaluation  of  attempts  to 
measure  intelligence  will  be  given  at  the  end  of  the  next 
chapter. 
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CHAPTER  IV 

THE  MEASUREMENT  OF  INTELLIGENCE — Continued 

The  discussion  of  the  measurement  of  intelligence  in 
this  chapter  will  be  divided  into  three  parts:  (1)  the 
extension  and  revision  of  the  Binet  Scale  and  other 
measures  of  intelligence;  (2)  group  intelligence  scales; 
and  (3)  a  summary  and  evaluation  of  measures  of  in- 
telligence. 

I.  The  Extension  and  Revision  op  the  Binet  Scale  and 
Other  Measures  op  Intelligence 

'Among  the  many  revisions  and  extensions  of  the  Binet 
Scale,  perhaps  the  most  important  one  in  America  is  that 
made  by  Dr.  Lewis  M.  Terman,  known  as  the  Stanford 
Revision.  Other  noted  revisions  and  extensions  are  those 
made  by  Goddard,  whose  revision  was  first  to  appear  in 
America,  The  Point  Scale  referred  to  above,  by  Yerkes, 
Bridges,  and  Hardwick,  and  the  translation  and  revision 
by  Kuhlmann. 

It  is  beyond  the  scope  of  an  elementary  work  of  this 
kind  to  give  a  detailed  discussion  of  the  various  revisions 
and  criticisms  of  the  Binet  Scale.  It  is  the  purpose  rather 
to  give  some  of  the  more  salient  features  of  the  attempts 
at  revisions  and  criticisms  and  to  refer  the  reader  to  a 
more  exhaustive  treatment  of  these  things  elsewhere. 

The  Stanford  Revision  of  the  Binet  Scale. — To  render 
the  Binet  Scale  applicable  to  American  conditions,  some 
reorganization  was  necessary,  first,  because  conditions  in 
America  were  different  from  those  in  France,  and  second, 
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because  it  was  felt  that  while  the  construction  of  the 
Binet  Scale  was  fundamentally  right  yet  there  were  a 
number  of  details  that  needed  revision.  According  to 
Terman's  view,  many  of  the  tests  were  misplaced.  There 
was  a  dearth  of  tests  at  the  higher  mental  levels;  the 
procedure  in  giving  the  tests  was  not  standardized;  and 
many  minor  changes  were  necessary. 

The  revision  was  a  long  process  involving  several  years 
of  tedious  work  in  examining  and  reexamining  ap- 
proximately 2,300  subjects,  1,700  of  which  were  normal 
children,  200  defectives  and  superior  children,  and  about 
400  adults. 

After  Terman  had  compared  the  data  from  tests  made 
with  the  Binet  Scale  in  various  parts  of  the  world  he 
decided  to  provide  40  additional  tests  to  the  Binet  Scale 
to  be  used  in  the  tryout  for  revision.  This  would  make  it 
possible  to  eliminate  some  of  the  least  satisfactory  ones 
and  at  the  same  time  allow  six  tests  for  each  age  group 
instead  of  five.  Care  was  taken  to  secure  children  whoso 
ages  did  not  vary  more  than  two  months  from  a  birthday ; 
that  is,  the  age  of  eight-year-old  group,  for  instance, 
ranged  from  seven  years  ten  months  to  eight  years  two 
months.  All  the  children  within  two  months  of  a  birth- 
day were  tested  in  order  to  avoid  accidental  selection. 
Tests  of  foreign-born  children  were  eliminated  in  the  final 
treatment  of  the  data  because  they  clearly  did  not  repre- 
sent a  normal  group.  The  children's  responses  to  the 
tests  were  recorded  almost  verbatim.  The  revision  of  the 
scale  below  the  14-year  level  was  based  almost  entirely 
on  1,000  unselected  children.  The  object  was  to  arrange 
the  tests  in  such  a  way  that  the  median  mental  age  of 
the  unselected  children  of  each  age-group  would  coincide 
with  the  median  chronological  age.  For  example,  a  cor- 
rect scale  must  cause  the  average  child  five  years  old 
chronologically  to  test  five  years  old  mentally,  and  the 
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average  six-year-old  child  to  test  six  years  mentally  and 
so  on. 

This  relation  was  expressed  in  what  is  known  as  the 
intelligence  quotient  discussed  in  the  previous  chapter.  The 
scale  above  the  14-year  age-level  was  based  on  the  results 
of  testing  adults,  since  children  in  the  grades  older  than 
14  would  be  classified  as  retarded.  The  validity  of  par- 
ticular parts  of  the  scale  was  determined  by  dividing  the 
children  of  each  age-level  into  three  groups  according  to 
intelligence  quotients:  (1)  those  testing  below  90;  (2) 
those  testing  between  90-109;  and  (3)  those  with  an 
intelligence  quotient  of  more  than  110.  The  percentages 
of  passes  at  or  near  that  age-level  v/as  then  ascertained 
separately  for  the  three  groups.  If  a  test  failed  to  show 
a  decidedly  higher  proportion  of  passes  in  the  superior 
group  than  in  the  inferior  group,  it  was  discarded  as  un- 
satisfactory. 

The  scale  when  finally  completed  consisted  of  90  tests, 
36  more  than  were  included  in  the  Binet  1911  Scale. 
There  are  six  tests  for  each  age-level  from  3  to  10,  eight 
at  12,  six  at  14,  six  at  ''average  adult"  age,  six  at 
"superior  adult"  age,  and  16  alternative  tests.  The  al- 
ternative tests  are  to  be  given  only  when  some  of  the 
regular  tests  have  been  rendered  unfit  bcause  of  coaching 
or  for  other  reasons. 

A  comparison  of  this  scale  with  the  1911  edition  of  the 
Binet  Scale  above  the  adult  group  shows  that  two  tests 
are  eliminated  and  29  are  relocated,  of  which  25  are 
moved  downward  and  four  upward.  Eighteen  of  the  29 
relocated  tests  were  moved  down  one  year,  four  were 
moved  down  two  years,  two  were  moved  down  three  years, 
one  was  moved  down  six  years,  three  were  moved  up 
one  year  and  six  moved  up  two  years. 

To  Find  the  Mental  Age  of  a  Child  By  the  Stanford 
Revision. — Since  there  are  six  tests  for  each  age-group 
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from  III  to  X,  each  test  passed  counts  two  months  to- 
wards mental  age.  For  instance,  if  a  child  were  to  pass 
all  the  tests  in  the  six-year  group  and  two  in  the  seventh, 
he  would  be  considered  six  years  and  four  months  old 
mentally.  Since  there  is  no  11-year  group  in  the  Stanford 
Revision  and  there  are  eight  tests  in  the  12-year  group, 
these  tests  must  cover  a  period  of  24  months  and  each 
test  passed  should  add  three  months  to  a  child's  mental 
age. 

The  Picture  Completion  Tests.— A  notable  attempt  to 
extend  the  Binet  Scale  is  found  in  the  picture  completion 
tests  devised  by  Healy.-  His  object  was  to  test  intelli- 
gence and  at  the  same  time  eliminate  the  language  factor. 

Since  modifications  of  the  Ebbinghaus  Completion 
Method  are  now  quite  extensively  used  for  language  and 
seem  to  correlate  very  highly  with  well-known  tests  of 
general  intelligence,  it  seemed  reasonable  to  Healy  to  sup- 
pose that  practically  the  same  sort  of  ability  would  be 
required  to  complete  a  picture  as  to  complete  a  sentence. 
He  therefore  devised  the  Picture  Completion  Test. 

In  giving  these  tests,  the  essential  thing  for  the  pupil 
is  to  see  that  something  is  lacking  in  the  general  situation 
in  the  picture.  The  child  is  asked  to  supply  the  missing 
parts  to  complete  the  scheme.  In  the  Picture  Completion 
Test,  the  choice  of  a  missing  part  is  limited  to  blocks 
supplied  to  the  subject,  whereas  in  the  language-com- 
pletion tests  in  general  use,  the  subject  has  the  entire 
range  of  his  vocabulary  from  which  to  supply  the  missing 
word. 

The  material  for  the  Picture  Completion  Test  consists 
of  a  brightly  colored  picture  10  by  14  inches  in  dimen- 
sions. It  represents  a  barnyard  scene  in  which  ten  simple 
activities  are  going  on.     There  is  no  obvious  connection 

2  Rudolf  Pintner  and  Margaret  M.  Anderson,  The  Picture  Completion 
Test  (Warwick  and  York,  1917),  ch.  ii. 
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between  the  activities,  but  each  is  of  such  a  nature  as 
to  appeal  to  the  childish  imagination.  An  object  neces- 
sary for  the  completion  of  any  one  of  the  activities  is 
omitted  and  it  becomes  the  task  of  the  examinee  to  find 
the  most  appropriate  object.  For  example,  two  boys  are 
playing  with  a  football.  One  of  the  boys  has  just  kicked 
it  into  the  air  and  the  other  boy  has  his  hands  in  a 
position  for  catching  it,  but  the  significant  thing,  the 
football,  has  been  omitted  and  a  blank  space  instead  of  the 
football  appears  between  the  two  boys. 

In  addition  to  the  ten  appropriate  blocks  for  completing 
the  pictures  there  are  40  others  from  which  the  subject 
may  choose,  ten  of  which  are  blank  while  the  other  30 
bear  pictures  of  objects.  The  picture  board  contains  10 
apertures  each  of  which  is  one  inch  square,  and  the  blocks 
are  so  cut  that  any  block  will  fit  any  aperture.  No  indica- 
tion of  the  correct  solution  may  be  gained  from  the  back- 
ground of  the  picture;  but  the  subject  must  grasp  the 
meaning  in  order  to  meet  the  requirements. 

A  test  of  this  kind  offers  some  measure  of  a  child's 
apperceptive  ability  and  shows  how  well  he  is  able  to 
use  his  past  experience  in  the  solution  of  new  and  novel 
situations.  This  factor  corresponds  in  the  main  to  the 
definitions  of  intelligence  given  by  Stern,  Burt,  Binet,  and 
others.  It  is  claimed  that  this  test  differs  from  the  or- 
dinary picture-puzzle  tests  inasmuch  as  it  demands  a 
choice  on  the  part  of  the  child.  Healy  claims  the  tests 
correlate  well  with  the  apparent  mentality  of  both  the 
delinquents  and  the  normal  individuals.  The  most  of  the 
work  with  this  test  has  been  done  in  corrective  and  pro- 
tective institutions. 

The  Form  Board  for  Measuring-  Intelligence.— The  form 
hoard,  which  in  some  respects  is  very  much  like  Healy 's 
Picture  Completion  Test,  was  originally  devised  by  Seguin. 
A    false    conception    of    psychology,    the    old    faculty 
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psychology,  was  responsible  for  its  creation,  but  it  has 
since  proved  to  be  of  value  in  mental  diagnosis.  Owing 
to  the  fact  that  Seguin  believed  defective  children  dif- 
ferent from  normal  children  in  kind  of  mentality,  he 
thought  they  would  demand  a  different  kind  of  test  to 
measure  their  intelligence.  He  believed  that  general  im- 
provement could  be  brought  about  by  training  in  specific 
tasks;  that  the  mind  could  be  divided  into  separate  en- 
tities such  as  attention,  will,  memory,  imagination,  and 
the  like,  and  that  training  in  each  of  these  gave  a  general 
training  of  the  whole  mind. 

The  form  board,  which  consists  of  a  board  usually  about 
14  by  20  inches  in  dimensions,  has  a  number  of  irregular- 
shaped  apertures  in  it  with  a  number  of  blocks  that  will 
fit  the  various  apertures.  Each  block  will  fit  but  one 
aperture  and  the  test  consists  in  determining  how  quickly 
pupils  will  assemble  the  blocks  into  their  appropriate 
apertures.  Feeble-minded  people  have  little  sense  of  form 
and  proportion,  hence  they  must  try  many  times  before 
they  can  find  the  right  blocks  for  the  various  apertures. 

A  Scale  of  Performance  Tests. — Pintner  and  Paterson 
have  designed  and  assembled  a  group  of  tests  with  the 
idea  of  making  a  scale  which  will  supplement  the  intelli- 
gence scales  now  in  use.  Their  scale  is  the  result  of  an 
attempt  to  measure  the  mentality  of  deaf  children  and 
had  its  beginning  in  1914.  Not  only  is  it  desirable  to 
measure  the  mentality  of  deaf  children,  but  there  are 
many  other  types  that  cannot  be  measured  by  the  or- 
dinary intelligence  scales.  The  speech  defective,  the  back- 
ward child,  the  foreign  child,  and  many  types  of  sub- 
normal children  do  not  respond  to  these  tests  in  such  a 
way  as  to  reveal  their  mentality.  A  battery  of  per- 
formance tests,  therefore,  was  thought  by  these  men  to 
be  the  sine  qua  non  to  measure  their  general  intelligence. 
The  term  ''performance  tests"  is  used  here  in  a  restricted 
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sense  and  indicates  a  group  of  tests  which  involves  a  great 
deal  of  manipulation  with  the  hands  and  a  minimum 
amount  of  language  responses. 

The  scale  as  reported  by  the  authors  consists  of  a  group 
of  fifteen  tests  as  follows  :^ 

1.  The  Mare  and  Foal  Picture  Board,  a  modification  of  the 

original  as  designed  by  Healy. 

2.  The  Seguin  Form  Board,  Twitmeyer's  adaptation  of  the 

Goddard  or  the  Goddard  Board  itself. 

3.  The  Five-Figure  Board,  devised  by  Paterson. 

4.  The  Two-Figure  Board,  devised  by  Pintner. 

5.  The   Casuist  Form   Board,   a  copy   of  the  original  board 

devised  by  Knox. 

6.  The  Triangle  Test,  devised  by  Gwyn. 

7.  The  Diagonal  Test,  devised  by  Kempf. 

8.  Healy  Construction  Puzzle  A,  devised  by  Healy. 

9.  The  Manikin  Test,  devised  by  Pintner. 

10.  The  Feature  Profile  Test,  devised  by  Knox  and  Kempf. 

11.  The  Ship  Test,  devised  by  Glueck. 

12.  The  Picture  Completion  Test,  devised  by  Healy. 

13.  The  Substitution  Tests,  devised  by  Woodworth. 

14.  The  Adaptation  Board,  devised  by  Goddard. 

15.  The  Cube  Test,  devised  by  Ivnox  and  modified  by  Pintner. 

A  brief  description  of  the  first  test  wiU  give  an  idea 
of  the  general  nature  of  the  series. 

The  Mare  and  Foal  Picture  Board  is  a  board  measuring 
29  by  2414  centimeters  and  one  centimeter  thick  upon 
which  a  colored  picture  is  pasted.  The  picture  represents 
a  mare  and  foal  in  a  field  with  two  sheep  lying  on  the 
ground  and  three  chickens  in  the  foreground.  In  the 
background  two  houses  are  seen  in  the  distance.  Eleven 
pieces  of  irregular  shape  have  been  cut  from  the  picture. 
Each  piece  represents  certain  parts  of  the  animals  or  of 


"Rudolf  Pintner  and  Donald  G.  Paterson,  A  Scale  of  Performance 
Tests  (D.  Appleton  Co.,  New  York,  1917),  pp.  23-24. 
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the  scene.  In  giving  the  test  the  board  is  placed  in  front 
of  the  child  with  the  pieces  scattered  over  the  top.  The 
instructions  are  to  put  the  pieces  in  place  as  quickly  as 
possible  without  making  any  mistakes.  The  examiner 
watches  the  performance  of  each  child  and  records  the 
number  of  errors  made.  An  error  is  an  attempt  to  put  a 
piece  in  the  wrong  place.    The  time  is  five  minutes. 

Although  the  other  14  tests  differ  from  this  one,  yet  the 
general  plan  is  the  same.  In  selecting  the  tests  for  this 
scale,  the  aim  was  to  obtain  as  many  kinds  as  possible  so 
that  the  various  factors  entering  into  the  complex  known 
as  intelligence  might  be  brought  into  play.  Care  was  taken 
not  to  select  the  performance  of  a  specific  activity  that  was 
likely  to  have  been  learned  by  the  child.  The  tests  must 
be  of  such  a  nature  that  no  verbal  instructions  are  neces- 
sary. 

A  Point  Scale  for  Measuring  Mental  Ability. — There 
have  been  many  differences  of  opinion  as  to  whether  tho 
"  all-or-none "  method  of  scoring  used  in  the  Binet 
Scale  was  yielding  the  most  satisfactory  results.  Many 
psychologists  have  been  convinced  that  a  more  nearly 
perfect  picture  of  a  child's  mentality  might  be  drawn 
if  the  scale  were  a  little  finer;  that  is,  if  the  units  of 
accomplishment  were  made  smaller  so  that  a  child  would 
get  credit  for  any  part  of  a  test  properly  completed 
instead  of  having  to  complete  the  whole  test  in  order 
to  score. 

With  this  idea  in  mind,  Dr.  Robert  M.  Yerkes,  assisted 
by  James  W.  Bridges  and  Professor  Rose  S.  Hardwick, 
undertook  the  task  of  revising  the  method  of  scoring  used 
in  the  Binet  Scale.  Their  intentions  were  at  first  simply 
to  devise  a  better  method  rather  than  to  attempt  to  modify 
the  Binet  Scale.  A  number  of  preliminary  tests  were 
given  to  approximately  1,000  school  children  and  inmates 
in  a  psychopathic  hospital  to  determine  the  value  of  the 


112  EDUCATIONAL  MEASUREMENT 

single-series  and  the  partial-credit  ideas  before  attempt- 
ing to  develop  a  more  highly  satisfactory  form  of  point 
scale.  The  main  defect  they  sought  to  correct  in  the 
Binet  Scale  may  be  illustrated  from  the  following  ex- 
ample taken  from  the  Binet  Scale. 

Suppose  three  pupils,  A,  B,  and  C,  were  asked  to  count 
backwards  from  20  to  0.  A  makes  the  count  without  error, 
B  makes  two  errors,  and  C  fails  entirely.  In  recording 
the  results  by  the  Binet  method,  A  is  given  a  perfect  score, 
and  B  and  C  are  recorded  simply  as  failures  and  are  put 
in  the  same  class.  Now  it  is  evident  that  5's  score  is  much 
nearer  like  A's  score  than  Cs.  The  point  scale  would 
give  A  a  perfect  score,  B  a  certain  number  of  credits,  and 
C  a  score  of  zero,  which  would  be  a  more  equitable  rating 
of  the  three  abilities. 

It  is  claimed  also  that  the  point  system  gives  due  credit 
to  the  more  difficult  reactions  which  are  many  times  not 
properly  weighted  by  the  other  method.  By  making  the 
units  of  accomplishments  smaller,  it  is  possible  to  note  gains 
made  in  case  the  test  is  repeated.  In  the  case  cited  above,  if 
the  test  were  again  given  to  C  and  he  was  able  to  count 
from  20  to  0  with  only  four  mistakes,  his  gain  would  be 
noted,  but  by  the  "  all-or-none "  method  no  progress  would 
be  recorded  until  he  was  able  to  make  the  count  without 
error.  It  is  claimed  by  the  authors  that  the  point  method 
of  scoring  tends  to  minimize  the  influence  of  the  personal 
equation  of  the  examiner  and,  in  doubtful  cases  where  the 
examiner  is  not  quite  sure  whether  the  examinee  made  a 
perfect  score  or  not,  the  pupil  may  be  more  accurately 
graded. 

It  is  argued  that  the  ideal  examination  question  is  one 
upon  which  the  abler  students  will  make  a  high  score  and 
on  which  the  poorly  prepared  will  be  able  to  give  some 
answer  even  though  the  answer  is  not  of  such  a  nature 
as  to  merit  a  perfect  score.     The  object  is  not  simply  to 
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know  that  certain  individuals  have  a  passing  knowledge 
of  the  topic,  while  certain  others  have  not,  but  to  laiow 
how  much  the  better  candidates  surpass  the  minimum  re- 
quirement and  to  what  degree  the  less  able  students  fall 
short  of  it.  A  point  scale  opens  the  way  for  the  classifica- 
tion of  individuals  in  more  nearly  homogeneous  groups 
than  they  may  be  classified  under  the  ' '  all-or-none ' '  method. 
The  dominating  idea  of  the  whole  scheme  is  to  take  cog- 
nizance of  small  gains;  to  make  small  units  of  accomplish- 
ment, rather  than  large  ones,  the  units  of  measurement. 
The  makers  of  the  Point  Scale  also  introduced  the  intelli- 
gence coefficient  which  is  obtained  by  dividing  the  number 
of  points  obtained  in  an  examination  by  the  number  of 
points  obtained  on  the  average.  For  example,  if  40  points 
were  the  average  for  eight-year-old  children  and  a  child 
were  to  make  36  points,  his  intelligence  coefficient  would 
be  36  -f-  40  =  0.9.  A  similar  measure  was  suggested  earlier 
but  had  not  been  put  into  practice  prior  to  the  time  of 
the  Point  Scale. 

The  testing  material  for  the  Yerkes,  Bridges,  and  Hard- 
wick  Point  Scale  was  drawn  very  largely  from  the  material 
used  by  Binet.  In  fact,  as  has  been  said,  the  original 
intentions  of  the  makers  of  the  Point  Scale  were  to  develop 
a  better  method  but  not  to  attempt  to  modify  the  Binet 
Scale.  But  as  the  work  progressed  the  makers  were  con- 
vinced that  much  of  the  material  in  the  Binet  Scale  was 
inadequate,  and  changes  were  made  where  it  was  deemed 
necessary.  The  tests  were  selected  with  the  intention  of 
covering  the  various  common  forms  of  the  principal  mental 
functions. 

Tables  I  and  II  on  the  pages  following  show  the  distri- 
bution of  the  tests  used  among  these  principal  mental  func- 
tions.* 

*  A  Point  Scale  for  Measuring  Mental  Ability,  pp.  7-9. 
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Table  I 
Tests 

1.  Auditory  memory  for  sentences,  attention. 

2.  Perception   (visual — of  things,  relations,  meanings),  ap- 

perception, association,  imagination. 

3.  Auditory  memory  for  words  (digits),  attention. 

4.  Discrimination — (a)   visual,    (b)    and   (c)    kinaesthetic. 
0.  Motor  coordination,  visual  perception. 

6.  Ideation  (association  and  analysis). 

7.  Esthetic    judgment    involving    perception,    association, 

and  analysis. 

8.  Perception,  apperception,  visual  memory,  imagination. 

9.  Association  (free),  vocabulary,  attention. 

10.  Analysis  and  comparison  of  remembered  objects,  atten- 

tion. 

11.  Memory,  imagination,  attention. 

12.  Practical  judgment  involving  memory  and  imagination. 

13.  Kinaesthetic  discrimination,  ideation    (notion  of  series), 

attention. 

14.  Imagination  and  command  of  language  forms. 

15.  Logical  judgment  based  on  imagination,  analysis,  and 

reasoning. 

16.  Suggestibility,  visual  perception,  comparison. 

16a.  Logical  judgment  based  on  analysis  and  reasoning.^ 

17.  Ideation  involving  vocabulary',  memory,  analysis. 

18.  Logical  judgment  based  on  analysis  and  reasoning,  at^ 

tention,  memory. 

19.  Visual  memory,  perception,  attention,  motor  coordina- 

tion. 

20.  Ideation    involving   analysis,    imagination,    command   of 

language  forms. 

Some  of  the  principles  for  the  selection  of  testing  ma- 
terial were  as  follows:  Other  things  being  equal,  prefer- 
ence was  given  to  tests  applicable  through  a  considerable 
range  of  years,  such  as  memory  span  and  free  association ; 
and  the  different  reactions  to  a  given  test  which  are  char- 
acteristic of  successive  stages  of  mental  growth  were  dis- 

^  This  is  an  extra  test,  introduced  as  a  possible  substitute  for  test  16 
at  a  time  when  that  seemed  likely  to  prove  unsatisfactory. 
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Mental  Processes 
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criminated  in  the  scoring  wherever  easily  recognizable.' 
For  example,  four  gradations  are  recognized  in  the  free 
association  test,  two  in  the  definition  of  concrete  terms,  four 
in  counting  backward,  and  so  on. 

Aside  from  the  points  mentioned  above  the  makers  of 
the  Point  Scale  state  further  reasons  for  the  superiority 
of  their  scale  over  the  Binet  Scale.  It  is  committed  to  no 
hypothesis  as  to  the  correlation  existing  between  chronolog- 
ical and  mental  age,  or  between  the  different  mental  func- 
tions at  different  stages  of  development.  It  is  capable  of 
giving  results  of  ever-increasing  reliability  and  precision 
as  data  accumulate  and  norms  are  established;  it  works 
with  a  smaller  amount  of  testing  material,  which  makes 
it  possible  to  select  better  material.  The  Point  Scale  con- 
sists of  20  tests  which  include  about  65  questions,  whereas 
the  Binet  pre-adolescent  scale  consisted  of  52  tests  with 
approximately  100  questions. 

Though  the  scale  described  above  is  for  children,  its 


*  A  Point  Scale  for  Measuring  Mental  Abilitrj,  p.  9. 
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makers  present  a  list  of  principles  for  a  universally  ap- 
plicable scale  for  mental  ability  that  will  include  adults 
as  well.    Among  these  principles  are  the  following: 

4.  Distribution  of  the  several  measurements  in  the  series 
equally  among  the  chief  mental  processes,  as,  for  example,  ac- 
cording to  the  four  following  categories : 

(a)  Receptivity,  including  such  functions  as  sensibility,  per- 
ceptivity, discrimination,  and  association. 

(&)  Imagination,  including  memory,  in  its  various  aspects, 
and  constructive  imagination. 

(c)  Affeetivity,  including  simple  feeling,  emotion,  sentiment, 
volition,  and  suggestibility. 

(d)  Thought,  including  ideation,  judgment,  and  reasoning. 

5.  Selection  of  twenty  parts  of  the  scale  so  that  there  shall  be 
five  for  each  of  the  groups  of  mental  functions,  classified  under 
the  headings  receptivity,  imagination,  affectivity,  and  thought. 

6.  A  minimum  of  200  points,  one-fourth  of  which  shall  belong 
to  each  of  the  above-mentioned  groups  of  processes. 

The  individual's  score  is  to  be  in  terms  of  the  four 
mental  processes  instead  of  throwing  the  scores  all  together 
and  a  norm  to  be  determined  for  all  mental  process. 

The  two  underlying  principles  of  the  Binet  Scale  are: 
first,  the  arrangement  of  the  tests  in  groups  corresponding 
to  the  years  of  chronological  age  and  the  consequent  ex- 
pression of  the  results  as  "mental  age,"  and  second,  the 
related  ''all-or-none"  method  of  scoring.  Many  objections 
are  raised  against  each  of  these  principles.  In  reference 
to  tlie  first  principle,  which  assumes  that  all  normal  in- 
dividuals develop  mentally  by  similar  stages,  that  the  cor- 
relation between  the  different  functions  is  the  same  for  all 
individuals  at  a  given  stage,  and  that  physical  and  mental 
development  correspond,  the  critics  of  the  scale  claim  that 
these  statements  are  not  warranted  by  the  facts.  On  the 
contrary,  it  is  pointed  out  that  such  studies  as  those  made 
by  Decroly  and  Degand  in  Belgium  show  that  on  the  aver- 
age the  Belgian  children  are  a  j'ear  and  a  half  in  advance 
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of  the  group  selected  by  Binet  at  Paris.  Binet's  attempt 
to  account  for  this  discrepancy  on  the  ground  that  the 
Belgian  children  belong  to  a  more  privileged  class  is  not 
accepted  as  scientific. 

Binet's  tests  are  criticized  also  on  the  ground  that  a 
range  of  six  or  seven  years  is  included  within  what  con- 
stitutes the  "normal  age."  Binet  said  that  the  children 
in  one  quarter  of  Paris  were  found  to  be  advanced  by 
four  or  even  five  years,  and  adds  that  "one  must,  therefore 
no  longer  consider  the  retardation  or  advance  of  three 
years  as  an  anomaly."  This  statement  is  criticized  on  the 
ground  that  this  is  a  very  large  proportional  variation  for 
a  scale  that  covers  at  most  but  twelve  years. 

In  scoring  the  examination,  the  examinee  is  credited  with 
that  age  at  which  he  passes  all  the  tests,  plus  one  year 
for  every  five  tests  passed  from  more  advanced  groups. 
The  more  difficult  tests  designed  for  advanced  ages  do  not 
count  any  more  than  those  lower  down  when  the  final  score 
is  being  reckoned.  This  seems  to  be  an  inequitable  distribu- 
tion of  credit  because  more  credit  should  be  given  for  tests 
passed  in  the  more  advanced  age-groups  than  for  tests 
passed  in  the  lower  age-groups. 

Proposed  Reorganization  of  the  Binet  Scale  by  Meu- 
mann. — Meumann^  has  indicated  the  lines  along  which 
the  Binet  Scale  should  be  reorganized  and  extended.  His 
proposals  for  reorganization  fall  under  three  heads  as 
follows : 

I.  Endowment  or  Intelligence  Tests  Proper. — Under  this 
heading  he  includes  tests  on: 

1.  Concentration  and  fixation  of  attention. 

2.  The  immediate  retention  of  verbal  and  non-verbal  material. 

3.  Imagination  and  thinking  by  means  of  descriptions  of  pic- 
tures. 

^  Vorlesungen,  Vol  II,  pp.  286-288. 
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4.  The  combination  test  improved  by  Ebbinghaus. 

5.  Employing  and  concentrating  on  visual  images  in  special 
problems,  as  in  the  folded  paper  test,  or  such  a  test  as:  What 
kind  of  a  solid  figure  is  produced  when  a  right  triangle  is  rotated 
around  one  of  its  sides  making  the  right  angle? 

6.  Thinking  by  means  of  controlled  association  test. 

7.  The  concentration  and  synthesis  through  comparisons  and 
differentiations  of  different  difficult  objects  both  in  the  presence 
of  the  objects  and  in  recollection. 

II.  Tests  of  Development  in  the  Narrower  Sense. — Here 
would  be  given: 

1.  Vocabulary  tests. 

2.  Tests  measuring  the  range  of  attention  and  the  memory 
span. 

3.  Tests  determining  the  examinee's  ability  to  reproduce  words, 
as  in  the  opposite  test. 

4.  Tests  in  temporal  orientation,  and  so  on. 

III.  Tests  of  Environment. — These  tests  may  be  divided 
into  three  divisions: 

1.  Testing  school  knowledge,  as  knowledge  of  number,  coins, 
days  of  week,  dates,  names  of  months,  writing  from  a  copy,  and 
writing  from  dictation. 

2.  Knowledge  acquired  at  home ;  meaning  of  words  designating 
household  articles,  above  and  below,  before  and  after,  left  and 
right,  correctness  in  speech,  testing  errors  in  speech,  number  of 
the  fingers,  and  so  on. 

3.  Spontaneous  observation,  by  making  inventories  of  mental 
content.* 

Wallin's  Criticisms. — Wallin  criticizes  the  1911  edition 
of  the  Binet  Scale  on  the  ground  that  the  number  of  tests 
in  each  group  should  have  been  increased  rather  than 
diminished  and  a  greater  number  of  functions  covered. 
In  regard  to  the  principle  of  an  age-scale,  Binet 's  critics 
claim  he  surrendered  its  validity  on  many  occasions,  as, 

*  Cf.  Robert  R.  Rusk,  Experimental  Education  (Longmans,  Green 
&  Co.  1919),  pp.  185-187. 
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for  instance,  when  he  accounts  for  Belgian  children  test- 
ing higher  than  those  of  Paris  because  of  their  having 
a  more  favorable  environment ;  also  when  he  takes  excep- 
tion to  Katherine  Johnson's  results  because  she  treated 
as  a  single  group  children  from  different  levels  of 
privilege;  and  again  when  he  tells  of  the  wide  range 
among  normal  children. 

Other  Tests  Devised  to  Measure  Intellig-ence. — Many- 
attempts  have  been  made  by  still  others  to  devise  tests 
to  measure  intelligence.  Some  have  attempted  to  improve 
the  Binet  tests  while  others  have  devised  entirely  new 
ones.  The  following  have  attracted  considerable  atten- 
tion. 

Burt^  has  attempted  to  improve  the  method  of  record- 
ing the  mental  development  of  children.  Instead  of  using 
the  mental  age  as  a  measure  for  backwardness  and  mental 
deficiency,  he  has  used  the  standard  deviation  (for  a 
definition  of  standard  deviation  see  Chapter  XI)  of 
normal  pupils,  that  is,  the  pupils  who  are  in  the  usual 
class  in  school  for  their  chronological  age.  Taking  the 
standard  deviation  for  the  various  grades  he  found  it  to 
be  about  one-tenth  of  the  chronological  age.  That  is,  the 
standard  deviation  of  a  normal  child  10  years  old  is  one- 
tenth  of  his  chronological  age,  or  one  year.  A  child  five 
years  old  would  have  a  standard  deviation  of  a  half  year 
and  one  15  years  old  would  have  a  standard  deviation 
of  one  and  one-half  years.^°  He  found  that  by  taking  the 
standard  deviation  as  a  measure  a  child  who  was  retarded 
by  more  than  three-tenths  of  his  age  needed  a  special 
school  for  his  training.    He  says  :^^ 

.  .  .  for  practical  purposes,  "backward"  may  be  taken  to 
denote  children  who,  though  not  "defective,"  are  yet  unable,  in 

'  C.  Burt,  The  Distribution  and  Relations  of  Educational  Abilities 
(King  and  Son,  London) . 

10 /bid,  p.  31.  "/6id.,  p.  82. 
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the  middle  of  their  school  career  to  do  the  work  even  of  the  class 
below  their  age;  or  more  exactly,  children  who  deviate  below  the 
normal  by  at  least  one  and  a  half  times  the  ''standard"  deviation 
of  individuals  of  the  same  age  group ;  and  therefore  are  retarded 
by  15  to  30  per  cent  of  their  age. 

This  method  of  recording  mental  age  is  of  special  im- 
portance because  it  has  been  proposed  to  use  a  similar 
method  in  measuring  school  achievements. 

Meumann  points  out  that  in  spite  of  the  defects  of  the 
isolated  test  methods  (discussed  in  first  part  of  Chapter 
III) ,  these  tests  should  not  be  overlooked.  They  may  be  em- 
ployed as  preparatory  tests  in  segregating  students  for 
further  examination  and  they  are  valuable  also  as  a  means 
of  determining  quantitatively  particular  capacities. 

Rossolimo  has  an  interesting  and  unique  way  of 
graphically  representing  one's  mental  make-up.  His 
method  assesses  numerically  eleven  psychical  processes, 
each  of  which  has  a  value  assigned  to  it  varying  from 
zero  to  ten.     The  eleven  processes  tested  are  attention, 
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will,  apprehension,  visual  memory,  memory  for  the  ele- 
ments of  speech,  numerical  memory,  comprehension,  com- 
bination, mechanical  sense,  imagination,  and  observation. 
All  of  these  have  several  subdivisions.  There  are  ten 
tests  for  each  process  or  partial  process  and  the  score 
for  each  process  is  indicated  by  a  mark  on  a  table.  The 
points  are  then  joined  up  and  the  whole  figure  results 
in  what  is  known  as  a  "mental  profile."  ^^  The  score  for 
each  process  is  indicated  by  the  mark  on  the  table. 

De  Sanctis  devised  a  series  of  six  tests  primarily  to 
measure  not  the  level  of  intelligence,  as  do  the  Binet  tests, 
but  the  degree  of  mental  defection. 

A  general  idea  of  the  de  Sanctis  tests  may  be  gained 
from  the  description  given  below: 

1.  Five  balls  each  of  a  different  color  are  placed  before  the 
subject  and  the  examiner  says:  Give  me  a  ball.  The  time  and 
manner  of  the  response  is  noted. 

2.  The  same  five  balls  are  arranged  in  a  row  and  the  examiner 
says:  Which  is  the  ball  you  just  gave  mef  The  time  of  the 
response  is  again  noted. 

3.  The  subject  is  shown  a  wooden  cube  such  as  is  used  in  a 
kindergarten  and  the  examiner  says:  Do  you  see  this  block  of 
wood?  After  the  subject  has  noted  it,  the  examiner  continues: 
Pick  out  all  the  blocks  like  this  on  the  table.  On  the  table  have 
been  placed  three  cones,  five  cubes  and  two  parallelepipeds.  The 
time  for  selecting  and  arranging  the  cubes  is  noted. 

4.  The  examiner  shows  the  subject  a  cube  and  says:  Do  you 
see  this  block?  After  noting  the  block  the  examiner  continues: 
Point  out  a  figure  on  the  form  chart  that  looks  like  it.  The  form 
chart  consists  of  ten  rows  of  squares,  triangles  and  rectangles 
with  fourteen  figures  in  each  row,  or  140  figures  in  all.  If  the 
subject  points  to  a  square,  the  examiner's  next  command  is: 
Take  this  pencil  {or  pointer)  and  point  out  all  the  squares  on 
the  chart  as  fast  as  possible,  without  missing  any,  taking  thei 
figures  line  by  line.    The  time,  mistakes,  and  omissions  are  noted. 

\'  "Mental  Profiles:  A  Quantitative  Method  of  Expressing  Psycho- 
logical Processes  in  Normal  and  Pathological  Cases,"  Journal  of 
Experimental  Pedagogy,  Vol.  1,  pp.  211-214. 
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5.  Additional  blocks  are  placed  on  the  table  in  such  a  manner 
that  the  distance  between  the  cubes  is  not  more  than  two  centi- 
meters. Each  cube  should  be  just  one-half  centimeter  longer  on 
each  side  than  the  next  smaller  one.  If  it  is  desired  to  make 
the  test  more  difficult,  the  number  of  cubes  may  be  increased  or 
the  difference  in.  size  decreased.  The  examiner  then  says :  Here 
are  some  more  blocks  like  tJiose  you  have  pointed  out  on  the 
chart.  Look  at  them  carefully  and  tell  me:  (1)  How  many  there 
are.  (2)  Which  is  the  largest.  (3)  Which  is  the  farthest  avxty 
from  you.    The  time,  errors  and  omissions  are  noted. 

6.  The  examiner  asks:  Do  large  objects  weigh  more  or  less 
than  small  objects?  Why  does  a  small  object  sometimes  weigh 
more  than  a  large  one?  The  second  question  is  asked  if  the  first 
one  has  been  answered  correctly.  The  subject  is  then  asked :  Do 
distant  objects  appear  larger  or  smaller  than  near  objects?  Do 
they  only  seem  smaller  or  are  they  really  smaller?  (The  last 
question  will  show  whether  the  subject  is  aware  of  optical  illu- 
sions.) 

De  Sanctis  points  out  that  if  the  subject  does  not  pass 
the  second  test  the  mental  deficiency  may  be  considered 
of  a  high  grade.  If  the  subject  cannot  go  beyond  the 
fourth  test,  or  if  he  makes  many  mistakes,  or  is  not  at  all 
certain  in  the  fifth,  the  mental  deficiency  may  be  considered 
slight.  If  the  sixth  test  is  completed  without  a  mistake  the 
subject  may  be  said  to  present  no  mental  deficiency. 

II.  Group  Intelligence  Tests 

One  of  the  greatest  drawbacks  to  the  Binet  Scale  from 
an  administrative  standpoint  is  the  fact  that  the  tests  must 
be  given  to  but  one  individual  at  a  time.  This  makes  their 
administration  a  great  burden  if  the  number  of  persons 
to  be  given  the  tests  is  very  great.  In  order  to  overcome 
this  difficulty,  group  tests  for  mental  ability  have  been 
introduced.  The  gain  from  an  administrative  standpoint, 
however,  has  not  been  an  "unmitigated  blessing,"  since 
much  had  to  be  sacrificed  in  order  that  the  test  might  be 
given  to  groups  of  children. 
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On  the  other  hand,  the  group  intelligence  tests  were  used 
in  the  American  army  with  marked  success.  The  experi- 
ence with  the  drafted  men  in  the  army  proved  that  not 
only  individual  tests  of  intelligence,  but  group  mental  tests 
could  be  used  to  good  advantage.  The  tests  were  thus 
introduced  into  the  army,  and  psychological  methods  were 
used  in  the  selection  of  its  personnel. 

The  American  army  was  the  only  army  that  made  use 
of  intelligence  tests  in  the  selection  of  officers.  As  a  result 
of  the  application  of  the  group  intelligence  tests  to  hun- 
dreds of  thousands  of  men,  and  individual  tests  to  a  very 
large  number  of  the  same  men,  correlations  were  made 
between  the  two  test  series,  and  between  the  single  tests 
and  various  other  criteria  for  judging  intelligence.  By 
these  methods  the  group  tests  were  found  to  be  very  re- 
liable. 

For  the  army  Beta  group  mental  tests,  no  educational 
training  was  necessary,  not  even  the  ability  to  read  and 
write.  The  tests  were  given  to  illiterate  foreigners  who 
did  not  know  the  English  language  and  also  to  deaf 
mutes.  Some  of  the  illiterate  foreign  recruits  made  scores 
higher  than  the  average  of  the  commissioned  officers. 

For  the  army  Alpha  group  intelligence  tests,  ability  to 
read  and  write  and  to  understand  English  is  essential; 
therefore,  education  through  the  third  and  fourth  grades 
is  probably  necessary,  but  after  this  point,  educational 
training  has  little  effect  upon  the  results.  As  a  matter 
of  fact,  some  with  almost  no  educational  training  made 
among  the  highest  scores. 

Principles  Involved  in  the  Selection  of  Group  Tests. — 
The  first  requirement  in  the  selection  of  any  test  is  that 
its  beginning  be  easy  enough  and  the  directions  clear 
enough  that  all  the  children  may  be  able  to  do  a  part 
of  the  test. 

One  of  the  purposes  for  devising  and  standardizing 
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group  tests  is  to  provide  a  scale  for  measuring  the  intelli- 
gence of  large  groups  of  children  with  sufficient  accuracy 
to  sort  out  all  children  of  questionable  mentality.  An- 
other purpose  is  to  obtain  a  group  scale  that  will  dis- 
criminate between  dull,  average,  and  supernormal  chil- 
dren. When  group  tests  are  given  one  can  be  reasonably 
certain  that  children  who  make  exceptionally  low  scores 
will,  when  tested  individually  by  the  Binet  revisions,  be 
found  to  be  mentally  deficient.  This  method  saves  much 
time. 

Requirements  of  Group  Tests  Are  Many. — They  must 
not  only  possess  the  characteristics  necessary  for  the  in- 
dividual tests  but  satisfy  other  demands.  Simplicity  is 
an  important  criterion  for  the  selection  of  group  tests; 
simplicity  of  material,  of  directions,  of  response,  and  scor- 
ing. Since  a  large  group  of  children  are  to  be  tested 
at  one  time,  it  is  necessary  that  the  tests  be  selected  in 
which  the  material  used  can  be  easily  carried  and  quickly 
distributed. 

Simplicity  of  directions  is  one  of  the  very  essential 
characteristics  of  group  tests  for  the  lower  grades,  espe- 
cially those  below  the  third.  The  following  experience 
recorded  by  Miss  Frances  Lowell  illustrates  the  point  in 
question.     She  says: 

Frequently  one  feels  confident  that  the  directions  for  a  certain 
test  are  perfectly  clear  and  that  they  could  not  possibly  be 
misunderstood,  and  yet  when  the  test  is  given  to  a  group  of 
children  he  finds  that  the  meaning  is  entirely  lost.  Good  English 
must  frequently  be  sacrificed,  for  the  child  in  the  primary  grades 
is"  surprisingly  limited  in  vocabulary.  Originally  in  giving  the 
directions  for  the  one  of  the  six -year  tests  the  writer  said: 
"Make  a  cross  in  the  largest  square."  The  result  was  puzzling, 
for  the  children  crossed  the  smallest  square  as  often  as  they  did  the 
largest.  The  test  apparently  was  a  failure  for  that  age-group. 
However,  before  discarding  it  the  writer  decided  to  experiment  a 
little  to  see  if  the  difficulty  could  be  discovered.  Instead  of  having 
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the  children  cross  the  square  they  were  asked  to  point  to  the  larg- 
est square,  whereupon  one  little  girl  tearfully  informed  the  writer 
that  she  didn't  know  what  that  meant.  That  solved  the  problem; 
from  then  on  children  were  instructed  to  make  a  cross  in  the 
biggest  squaxe. 

The  Terman  Group  Tests  of  Mental  Ability. — Dr.  Lewis 
M.  Terman  devised  a  series  of  group  tests  for  grades  7 
to  12  inclusive.  Like  the  Binet  tests  they  consist  of  a 
series  of  questions  and  problems  selected  from  a  large 
mass  of  possible  test  material.  Each  separate  item  was 
correlated  with  a  dependable  measure  of  mental  ability. 
The  preliminary  trial  series  from  which  the  tests  were 
finally  made  was  composed  of  a  two-hour  test  of  13 
parts  with  886  different  items.  As  a  result  of  the  prelimi- 
nary tryout,  three  of  the  13  parts  were  eliminated  and 
the  number  of  items  in  the  remaining  ones  reduced  to 
370.  If  an  item  failed  to  differentiate  pupils  of  known 
brightness  from  pupils  of  known  dullness,  it  was  elimi- 
nated. Much  attention  was  given  to  simplicity  and  con- 
venience in  making  up  the  tests.  The  tests  are  issued  in 
two  forms,  A  and  B.  Each  contains  185  questions  or 
problems.  In  any  test  of  intelligence  there  is  always  a 
margin  of  error.  The  author  of  this  test  suggests  that 
both  forms  A  and  B  be  used  and  that  the  average  result 
obtained  from  both  forms  be  used  as  a  possible  basis  for 
guidance  of  the  individual  pupil.  Each  form  consists  of 
10  groups  of  tests.  Each  group  has  a  number  of  ques- 
tions or  problems  to  be  solved. 

Test  1.  Information. — It  consists  of  20  parts.  The  pupil  is 
told  to  draw  a  line  under  the  one  word  that  makes  the  sentence 
true.  Example:  Coffee  is  a  kind  of  hark,  berry,  leaf,  root. 
Time,  two  minutes. 

Test  2.  Best  Answer. — It  consists  of  11  parts.  The  examinee 
is  asked  to  put  a  cross  before  the  best  answer  to  the  question  or 
statement  made.     Example:    Spokes  of  a  wheel  are  often  made 


126  EDUCATIONAL  MEASUREMENT 

of  hickory  hecatise:  (1)  hickory  is  tough,  (2)  it  cuts  easily,  (5) 
it  takes  paint  nicely.     Time,  two  minutes. 

Test  3.  Word  Meaning. — It   consists   of   30   pairs   of   words 

an'anged  as  follows :  Expel  —  retain same  —  opposite. 

The  instructions  are:  "If  the  words  mean  the  same,  the  examinee 
is  to  underscore  the  word  same,  if  they  mean  the  opposite,  under- 
score the  word  opposite.     Time,  two  minutes. 

Test  4.  Logical  Selection. — The  instructions  are  to  draw  a 
line  under  two  words  that  the  thing  always  has;  underline  two 
and  only  two  in  each  line.  Example:  A  horse  always  has  harness, 
hoofs,  shoes,  stable,  tail. 

Test  5.  Arithmetic. — This  test  consists  of  12  problems  of  which 
the  one  given  below  is  a  sample:  "How  many  hours  will  it  take 
a  person  to  go  66  miles  at  the  rate  of  6  miles  an  hour?" 

Test  6.  Sentence  Meaning. — It  consists  of  24  parts.  Instruc- 
tions: "Draw  a  line  under  the  right  answer."  Example:  Does 
a  conscientious  person  ever  make  a  mistake?     Yes,  No. 

Test  7,  Analogies. — Consists  of  20  sentences  in  which  the 
analogies  are  to  be  picked  out.  Example:  Ear  is  to  hear  as 
eye  is  to  table,  see,  hand,  play.  The  examinee  is  required  to 
underscore  the  proper  word.    Time,  two  minutes. 

Test  8.  Mixed  Sentences. — Consists  of  18  sentences  with 
words  not  arranged  in  proper  order.  The  instructions  are  to 
arrange  the  words  in  correct  order  and  then  tell  whether  the 
sentence  is  time  or  false  by  underscoring  one  of  these  words. 

Example:  (1)  True  bought  cannot  friendship  be true,  false. 

Time,  three  minutes. 

Test  9.  Classification. — Consists  of  18  groups  of  words,  one 
word  in  each  group  of  which  does  not  belong  to  that  class.  The 
instructions  are  to  cross  out  the  word  that  does  not  belong  there. 
Example:  Automobile,  bicycle,  buggy,  telegraph,  tram.  Time, 
three  minutes. 

Test  10.  Number  Series. — Consists  of  12  rows"  of  numbers 
arranged  in  such  a  way  that  the  examinee  can  predict  what  the 
next  two  missing  numbers  should  be  by  the  arrangement  of  the 
numbers  that  are  given.  Example  :  3  —  8  —  13  —  18  —  23  — 
28  —  ?  —  ? 

Examination  Form  B  is  constructed  on  the  same  plan 
as  Form  A  and  is  of  approximately  the  same  difficulty. 
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The  scoring  is  done  by  ten  scoring  keys  which,  for 
convenience,  are  printed  on  a  single  sheet. 

Mental  ability  should  be  the  fundamental  basis  for 
all  grading  promotion  and  classification  of  pupils.  Ter- 
man  emphasizes  the  fact  that  the  purpose  of  these  tests 
is  "to  make  a  difference  in  the  educational  treatment  of 
pupils,  not  to  gratify  a  merely  idle  curiosity  regarding 
their  intellectual  status." 

Accordingly,  one  of  the  chief  functions  of  these  tests 
is  in  the  classification  of  pupils  into  different  groups  for 
special  types  of  instruction,  as  in  the  Oakland  Plan,  which 
divides  the  pupils  into  three  groups — the  bright,  the  aver- 
age, and  the  slow,  for  separate  instruction  according  to 
their  needs. 

The  use  of  the  tests  will  clear  up  many  misunderstand- 
ings regarding  individual  children.  They  help  to  give 
reasons  for  the  seeming  misfits  in  school.  They  give  edu- 
cational guidance  that  helps  to  simplify  the  problems  of 
vocational  guidance.  It  is  not  claimed,  of  course,  that 
the  tests  are  infallible,  but  it  is  claimed  that  they  make 
a  very  important  point  of  departure  for  further  study  of 
the  pupil. 

The  National  Intelligence  Tests.— In  March,  1919,  the 
General  Education  Board  granted  the  National  Research 
Council  the  sum  of  $25,000  to  be  used  in  devising  methods 
for  measuring  the  intelligence  of  school  children.  The 
Research  Council  caused  to  be  organized  a  committee  to 
do  this  work.  The  committee  was  composed  of  Messrs. 
R.  M.  Yerkes,  chairman,  M.  E.  Haggerty,  L,  M.  Terman, 
E.  L.  Thorndike,  and  G.  M.  Whipple. 

The  committee  decided  to  prepare  a  series  of  tests  for 
children  from  grades  3  to  8  inclusive.  From  a  great  mass 
of  data,  material  was  selected  for  21  tests  for  a  prelimi- 
nary trial.  Children  in  a  number  of  eastern  cities,  includ- 
ing "Washington,  D.  C,  Cleveland,  New  York,  Richmond, 
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and  Alexandria,  were  given  the  preliminary  tests.  Dr. 
Truman  Kelley  statistically  analyzed  the  data  obtained 
from  this  preliminary  trial.  Guided  by  Dr.  Kelley 's  report 
and  other  information  relative  to  the  characteristic  be- 
iiavior  of  the  several  tests,  the  committee  selected  10  tests 
from  the  list  of  21  for  further  use. 

These  ten  tests  were  arranged  in  two  groups,  or  series, 
of  five  tests  each.  The  tests  selected  are  an  adaptation 
for  school  purposes  of  the  group  intelligence  tests  used 
in  the  examination  of  recruits  in  the  United  States  Army. 

Each  scale  is  a  complete  unit  for  testing;  but  it  is 
recommended  that  both  scales  be  used  in  order  that  one 
may  serve  as  a  check  on  the  other.  A  short  preliminary 
exercise  is  given  before  each  regular  test  to  acquaint  the 
pupil  with  the  general  nature  of  the  work  to  be  done. 
The  tests  in  Scale  A  consist  of: 

Test  1.  Arithmetical  Reasoning. — This  test  consists  of  16  prob- 
lems growing  progressively  more  difficult  from  the  first  to  the 
sixteenth.  The  time  allotted  to  this  test  is  five  minutes.  The 
instructions  given  the  pupils  are  to  solve  as  many  problems  as 
they  can  in  the  time  allotted. 

Test  2.  Sentence  Completion, — This  test  consists  of  20  sen- 
tences, parts  of  which  have  been  left  out.  The  instructions  are 
to  fill  the  blanks  with  words  to  make  the  sentences  sound  sensible 
xind  right.     Time,  four  minutes. 

Test  3.  Logical  Reasoning. — Consists  of  24  parts.  Part,  or 
row,  1  is  given  here  to  indicate  the  general  nature  of  the  test : 

Elephant   (circus,  ears,  hay,  keeper,  trunk) 

The  instructions  are:  "In  each  row  draw  a  line  under  each 
of  the  two  words  that  tells  what  the  thing  always  has."  Time, 
three  minutes. 

Test  4.  Same-Opposite. — Consists  of  40  pairs  of  words 
separated  by  a  blank  line,  as  new — old,  still — noisy,  fall — drop, 
etc.  The  instructions  are  to  write  "S"  between  them  if  they 
mean  the  same  and  "D"  if  they  mean  different  things. 
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Test  5.  Symbol-Digit. — Consists  of  nine  irregular-shaped  char- 
acters arranged  in  six  rows  with  20  characters  in  a  row.  The 
nine  digits,  1,  2,  3,  etc.,  are  placed  in  a  line  under  the  nine  char- 
acters, each  digit  corresponding  to  a  character.  This  is  called 
the  key.  The  instructions  are  to  make  under  each  drawing  or 
character  the  number  you  find  under  the  drawing  in  the  key. 
Time,  three  minutes. 

Scale  B  consists  of  five  tests  as  follows: 

Test  1.  Computation. — This  test  is  made  up  of  22  problems 
in  arithmetic.  The  problems  grow  progressively  more  difficult 
as  the  pupil  works  down  the  page.    Time,  four  minutes. 

Test  2.  Information. — It  consists  of  a  series  of  40  sentences, 
each  sentence  containing  four  words,  only  one  of  which  makes 
the  sentence  true.  Sentence  Number  1  reads  as  follows :  The 
day  before  Sunday  is  Friday,  Monday,  Saturday,  Tuesday.  The 
instructions  are  to  draw  a  line  under  the  one  word  that  makes 
the  sentence  true.    Time,  four  minutes. 

Test  3.  Vocabulary. — Forty  questions  to  be  answered  by  "yes" 
or  "no"  are  given.    Time,  three  minutes. 

Test  4.  Analogies. — This  test  consists  of  32  parts,  each  part 
consisting  of  seven  words.  The  first  two  words  bear  a  definite 
relation  to  one  another.  The  third  word  bears  an  analogous 
relation  to  one  of  the  four  succeeding  words  in  the  line.  Part 
1  of  this  test  is  given  here  to  illustrate  the  point:     Finger — 

hand toe,   box,  foot,   doll,   coat.      The   instructions   are: 

Read  the  first  three  words  in  each  line.  Then  read  the  last  four 
words  and  draw  a  line  under  the  right  one. 

Test  5.  Comparison. — The  test  consists  of  50  figures,  names, 
and  drawings,  arranged  in  pairs,  with  a  dotted  line  between 
them.  The  instructions  are:  "If  the  two  things  in  a  pair  are 
the  same  write  'S',  if  they  are  different  write  'D.' " 

The  Haggerty  Intelligence  Examinations. — Dr.  M.  E. 

Haggerty  devised  a  series  of  intelligence  tests  known  as 
Intelligence  Examinations,  Delta  1  and  Delta  2.  Delta  1 
consists  of  six  groups  of  tests  designed  for  children  of 
grades  one  to  three.  Delta  2  is  composed  of  the  same 
number  of  tests  and  is  designed  for  children  in  grades 
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three  to  nine.  Each  of  these  tests  is  preceded  by  a  fore- 
exercise  which  gives  the  pupil  an  idea  of  what  he  is  to 
do  in  the  regular  test. 

The  material  in  Delta  1  is  of  such  a  nature  that  a  child 
unable  to  read  may  do  several  of  the  exercises.  The  first 
regular  test  is  one  of  oral  directions.  It  consists  of  a  page 
of  pictures  and  depends  on  the  oral  directions  of  the 
examiner  to  do  the  test.  For  instance,  the  first  picture 
is  a  mouse  and  the  oral  directions  are  to  draw  a  ring 
around  the  mouse. 

Test  2.  Copying  Designs. 

Test  3.  Picture  Completion. — A  part  of  the  picture  is  left  out 
and  the  child  is  asked  to  supply  the  missing  parts. 

Test  4.  Picture  Comparison. — Here  each  line  contains  two 
pictures  separated  by  a  blank  line.  If  the  pictures  are  the  same 
the  pupil  writes  "S"  on  the  blank  line;  if  they  are  different  he 
writes  "D." 

Test  5.  Symbol-Digit. — The  nature  of  the  symbol-digit  test 
was  explained  in  the  discussion  of  the  National  Intelligence  Tests. 

Test  6.  Word  Comparison. — This  is  the  first  of  the  series  that 
requires  a  reading  ability  on  the  part  of  the  examinee.  This 
test  is  like  the  word  comparison  test  described  above. 

Delta  2  consists  of  six  tests  as  follows: 

Test  1.  Sentence  Reading. — It  consists  of  a  series  of  40  ques- 
tions growing  progressively  more  difficult  that  may  be  answered 
by  "yes"  or  "no."     Time,  five  minutes. 

Test  2.  Arithmetical  Problems. — A  list  of  20  problems  is  given 
and  the  pupil  is  asked  to  solve  as  many  as  he  can  in  five  minutes. 

Test  3.  Picture  Completion. — Time,  four  minutes. 

Test  4.  Synonym-Antonym. — Consisting  of  40  pairs  of  words. 
This  test  is  the  same  as  the  same-opposite  test  described  above. 

Test  5.  Practical  Judgment. — It  consists  of  a  series  of  16 
sentences  and  questions,  each  of  which  has  three  answers,  or 
reasons  for  being,  one  of  which  is  right.  The  child  is  asked  to 
put  a  cross  before  the  best  answer  to  the  statement  or  question. 
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Test  6.  Informaticm. — Consists  of  40  sentences.  It  is  similar 
to  the  one  described  under  the  National  Intelligence  Tests. 

The  Otis  Group  Intelligence  Scale. — The  Otis  Group 
Intelligence  Scale  is  designed  to  test  general  mental  ability. 
The  scale  is  issued  in  two  series,  a  Primary  Examination 
and  an  Advanced  Examination.  The  Primary  Examina- 
tion is  designed  especially  for  the  kindergarten,  and  for 
grades  1  to  4,  and  consists  of  eight  tests  which  do  not 
involve  the  ability  to  read.  The  Advanced  Examination, 
consisting  of  ten  tests  is  designed  for  grades  5  to  12;  in 
fact,  for  all  literary  persons,  including  university  students. 

In  each  series,  each  test  is  independent  of  every  other 
test,  but  the  tests  taken  together  form  a  scale,  which  is 
printed  in  the  form  of  an  examination  booklet.  The  ad- 
vanced examination  consists  of  ten  tests,  each  test  con- 
sisting of  a  series  of  questions  and  problems.  The  tests 
are  as  follows:  (1)  following  directions  (2)  opposites, 
(3)  disarranged  sentences,  (4)  proverbs,  (5)  arithmetic, 
(6)  geometric  figures,  (7)  analogies,  (8)  similarities,  (9) 
narrative  completion,  and  (10)  memory.  This  advanced 
examination  is  suitable  for  testing  all  students  from  the 
fifth  grade  upward  through  tne  high  school  and  the  uni- 
versity. 

The  Otis  tests  are  very  popular  and  are  praised  very 
highly  by  others  who  have  designed  group  tests  of  intel- 
ligence. Terman  writes  as  follows  about  the  Otis  Ad- 
vanced Examination:^^ 

It  is  applicable  to  any  individual,  whether  child  or  adult,  who 
has  had  the  equivalent  of  three  or  four  years  of  schooling.  With 
subjects  of  this  amount  of  schooling  the  Otis  Scale  probably 
comes  as  near  testing  raw  "brain"  power  as  any  system  of  tests 
yet  devised.  Indeed,  it  was  the  first  scientifically  grounded  and 
satisfactory  scale  for  testing  subjects  in  groups.  ...  No  one 
else  has  done  so  much  as  Dr.  Otis  to  free  intelligence  tests  from 


^*  The  Otis  Group  Intelligence  Scale,  Manual  of  Directions,  p.  3. 
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the  influence  of  the  personal  equation  of  the  examiner.  Perhaps 
it  is  too  much  to  hope  that  any  mental  tests  can  be  made  "fool 
proof"  but  it  is  not  too  much  to  say  that  the  Otis  Scale  can  be 
correctly  given  and  correctly  scored  by  any  one  who  is  intelligent 
enough  to  teach  school.  The  plan  of  arranging  the  tests  so  that 
they  may  be  scored  by  the  use  of  stencils  is  a  contribution  of 
both  practical  and  scientific  importance. 

The  Dearborn  Group  Intelligence  Tests. — Professor 
Dearborn  has  designed  two  series  of  tests,  one  for  grades 
from  1  to  3,  and  the  other  for  grades  4  to  9  inclusive. 
Series  I  consists  of  general  examination  1,  2,  and  3.  Series 
II  consists  of  general  exminations  4  and  5.  Though 
these  tests  differ  from  the  other  group  tests  described 
above,  they  are  very  similar  to  them  in  many  respects  and 
involve  practically  the  same  mental  functions  as  the  other 
tests. 

Uses  of  Intelligence  Tests. — The  uses  of  intelligence 
tests  are  many  and  varied.  They  are  beginning  to  play 
an  important  part  in  determining  vocational  fitness.  No 
one  will  claim,  of  course,  that  they  will  tell  us  unerringly 
for  which  of  a  thousand  or  more  occupations  an  individual 
is  best  fitted.  Sometime  in  the  near  future,  however, 
industries  will  begin  to  set  the  minimum  intelligence 
quotient  for  employees  in  their  business.  We  now  know 
that  of  the  people  who  belong  to  the  ranks  of  the  industrial 
inefficient  there  is  a  very  high  percentage  who  are  of  sub- 
normal intelligence.  Of  150  ''hoboes"  that  Mr.  Knollin 
tested  a  few  years  ago,  15  per  cent  belonged  to  the  moron 
grade  of  mental  deficiency  and  almost  as  many  were 
borderline  cases.^* 

A  bulletin  published  by  the  War  Department  on  Army  ^^ 
mental  tests  shows  the  intelligence  level  for  the  various 


1*  Cf.  Lewis  M.  Terman,  The  Measurement  of  Intelligence,  p.  54. 
^'^  Army  Mental  Tests,  Methods,  Typical  Results  and  Practical  Appli- 
cations, November  22,  1918,  Washington,  D.  C. 
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occupations  as  determined  by  intelligence  tests.    The  scores 
for  some  of  the  occupations  are  given  below: 

45    to      49. — Farmer,  laborer,  general  miner,  and  teamster, 

50    to      54. — Stationary  gas  engine  man,  house  hostler,  horse- 

shoer,  tailor,  general  boilermaker,  and  barber. 
55    to      59. — General  carpenter,  painter,  heavy  truck  chauffeur, 
horse    trainer,    baker,    cook,    concrete    or    cement 
worker,    mine    drill   runner,    bricklayer,    cobbler, 
caterer. 
60    to      64. — General  machinist,  lathe  hand,  general  blacksmith, 
brakeman,    locomotive    fireman,    auto    chauffeur, 
telegi-aph  and  telephone  lineman,  butcher,  bridge 
carpenter,     railroad     conductor,     railroad     shop 
mechanic,  locomotive  engineer. 
65    to      69. — Laundryman,    plumber,   auto    repairman,   general 
pipefitter,  auto  engine  mechanic,  auto  assembler, 
general   mechanic,   tool   and   guage   maker,   stock 
checker,  detective  and  policemen,  toolroom  expert, 
ship     carpenter,     gunsmith,     marine     engineman, 
hand-riveter,  telephone  operator. 
74. — Truckmaster,  farrier,  and  veterinarian. 
79. — Receiving  clerk,  shipping  clerk,  stock  keeper. 
84. — General    electrician,    telegrapher,    band    musician, 

concrete  constructor  foreman. 
89. — PhotogTapher. 
94. — Railroad  clerk. 
99. — General  clerk,  filing  clerk. 
100    to    104.— Bookkeeper. 
105    to    109. — Mechanical  engineer. 
110    to    114. — Mechanical  draughtsman, 
lib    to    119. — Stenographer,    typist,    accountant,    civil    engineer, 

Y.  M.  C.  A.  secretary,  medical  officer. 
125  and  over. — Army  chaplains,  engineer  officers. 

"With  the  educational  levels  of  the  various  occupations 
thus  determined,  those  attempting  to  give  vocational 
guidance  may  have  the  sanction  of  science  in  the  judg- 
ments they  make.  Suppose,  for  example,  a  pupil  aspired 
to  be  a  mechanical  engineer.     The  intelligence  level  for 
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mechanical  engineers  is  from  105  to  109,  Suppose  further 
that  the  intelligence  level  of  this  particular  pupil  is  only  82. 
Then  the  vocational  counselor  may  say  to  him  that  he  is 
aspiring  to  do  work  on  an  educational  level  several  units 
beyond  the  level  for  which  his  general  intelligence  fits  him 
and  that,  if  he  persists  in  his  attempts  to  be  a  mechanical 
engineer,  he  will  find  it  extremely  difficult  to  succeed  be- 
cause he  does  not  have  the  potential  power  to  succeed 
on  that  level. 

There  are  many  things  that  the  vocational  counselor 
must  bear  in  mind  when  attempting  to  measure  intelligence 
and  determine  the  vocational  fitness  of  individuals  for  the 
various  businesses  and  professions.  It  may  be,  as  Thorn- 
dike  says,  that  we  have  three  intelligences  instead  of  one 
general  intelligence,  namely  abstract  intelligence,  social 
intelligence,  and  mechanical  intelligence.  The  intelligence 
tests  we  now  employ  are  primarily  designed  to  measure 
abstract  intelligence,  the  ability  to  do  abstract  reasoning 
in  arithmetic,  logic,  to  deal  successfully  with  abstract 
ideas,  etc.  An  individual  may  make  a  high  score  with 
tests  of  this  kind  and  yet  score  low  in  matters  of  social 
intelligence.  The  following  example  will  make  the  matter 
clear:  College  professors,  teaching  mathematics,  are 
usually  considered  to  be  good  reasoners.  Mathematics  is 
supposed  to  bring  into  play  the  reasoning  faculties  of  the 
mind.  If  one  is  a  good  reasoner  in  mathematics,  will  it 
follow  that  he  will  be  able  to  reason  successfully  in  a  busi- 
ness situation  which  is  pretty  largely  social?  We  know 
that  there  are  many  business  men  who  would  not  score  as 
high  as  the  mathematicians  on  the  intelligence  tests  now 
in  use,  but  would  be  very  much  better  in  analyzing  a  busi- 
ness situation  than  the  college  mathematician.  McCall 
points  out  that  there  is  a  much  higher  correlation  within 
any  one  of  these  intelligences  than  between  any  two  of 
them. 
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If  these  hypotheses  are  true,  then  the  vocational  expert 
must  be  slow  to  assign  an  individual  to  a  particular  edu- 
cational level  unless  he  has  tested  him  with  tests  that 
reveal  his  level  in  that  particular  intelligence. 

The  Binet  tests  found  early  and  widespread  use  in 
juvenile  courts,  in  state  surveys  for  feeble-mindedness,  and 
in  many  other  fields  of  research.  Each  of  the  various 
fields  has  a  literature  of  its  own.  It  has  been  estimated 
that  approximately  4,000,000  pupils  were  tested  in  the 
public  schools  within  thirty  months  after  the  appearance 
of  the  first  group  intelligence  scale.  Children  of  the  inter- 
mediate and  grammar  grades  are  the  ones  most  favored 
in  this  respect.  Classifications  are  being  made  on  a  basis 
of  the  scores  made  on  these  tests.  It  is  not  claimed  that 
all  the  classifications  made  are  wise  and  will  lead  to  better 
results.  In  the  main,  however,  the  results  are  gratifying 
and  have  been  worth  while. 

The  workers  in  the  field  are  growing  more  cautious  as 
the  work  advances  and  are  warning  those  who  are  less 
familiar  with  the  limitations  of  the  tests  to  be  very  con- 
servative in  their  claims  for  them. 

Wherever  intelligence  tests  have  been  given  in  the 
schools,  they  have  shown  that  approximately  2  per  cent 
of  the  school  population  will  never  develop  beyond  the 
11  to  12-year-old  level.  The  tests  are  giving  us  much 
valuable  information  along  many  lines.  Healy  and  others 
have  pointed  out  that  courts  are  in  the  habit  of  admin- 
istering punishment  to  juvenile  offenders  without  know- 
ing very  much  about  the  mentality  of  those  whom  they 
direct.  A  very  high  percentage  of  the  social  offenders 
are  mentally  deficient.  Mental  testing  is  beginning  to  shed 
much  light  on  these  problems  and  less  injustice  will 
undoubtedly  be  done  in  the  future  in  passing  judgment 
on  our  social  offenders. 

The  use  and  abuse  of  the  tests  have  aroused  widespread 
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criticisms,  both  constructive  and  destructive.  In  any  event, 
the  criticisms  have  given  us  a  clearer  understanding  of 
what  intelligence  is  and  how  to  measure  it. 

III.  Summary  and  Evaluation  of  the  Measurement  of 
Intelligence 

In  the  foregoing  pages  we  have  noted  some  of  the  major 
problems  that  confronted  those  attempting  to  measure  in- 
telligence. We  have  also  pointed  out  some  of  the  attempts 
to  solve  these  problems  and  have  noted  some  of  the  more 
important  tests  and  scales  now  in  use.  "We  shall  now  at- 
tempt to  summarize  and  evaluate  the  movement  to  measure 
intelligence  and  bring  our  description  of  it  up  to  date 
and  also  note  the  fields  where  special  research  work  will 
probably  be  done  in  the  immediate  future. 

"We  noted  in  the  first  part  of  Chapter  III  that  the  scales, 
or  groups  of  tests  to  measure  intelligence,  have  arisen  from 
the  individual  tests.  The  mental  scales  are  merely  a  group- 
ing together  of  these  individual  tests  in  order  to  give  a 
more  general  picture  of  the  mental  make-up  of  the  indi- 
vidual. The  first  tests  were  concerned  with  the  specific 
"faculties,"  capacities,  or  abilities  of  the  mind.  They 
came  into  existence  when  the  old  "faculty  psychology" 
was  still  in  good  repute.  It  was  then  thought  that  the 
proper  way  to  measure  intelligence  was  to  choose  any 
faculty  or  trait  of  the  mind  for  investigation  and  that  the 
data  obtained  from  the  measurement  of  the  trait  chosen 
would  be  indicative  of  the  general  intelligence  of  the  in- 
dividual. It  Avas  also  thought  that  these  faculties,  or 
abilities,  functioned  somewhat  independently  of  one  an- 
other and  that  they  were  easily  isolated  for  investigation. 

It  was  soon  found,  however,  that  the  old  "faculty 
psj'chology "  idea  was  untenable  and  that,  instead  of  a 
trait  or  capacity  functioning  more  or  less  independent  of 


THE  MEASUREMENT  OF  INTELLIGENCE    137 

other  capacities,  it  was  very  closely  associated  with  them 
in  its  functioning  and  that  the  complete  isolation  of  a 
mental  trait  for  measurement  was  impossible. 

All  are  agreed  that  there  is  a  native  capacity  that  con- 
ditions all  mental  functioning.  Just  what  this  capacity  is 
and  how  it  manifests  itself  are  unsettled  questions.  It 
must  not  be  inferred,  however,  that,  because  psychologists 
cannot  define  and  completely  measure  this  native  endow- 
ment, we  cannot  turn  the  measurements  thus  far  made  to 
good  account.  Data  obtained  from  measurements  now  made 
are  not  only  of  theoretical  and  scientific  value  but  prac- 
tical value  as  well. 

We  are  just  entering  one  of  the  most  fertile  fields  of 
research  that  science  will  ever  explore.  Everything  of 
a  constructive  nature  that  man  does  is  conditioned  and 
determined  by  this  mental  capacity  we  are  attempting  to 
measure.  All  philosophy,  science,  and  art  wait  upon  its 
growth  and  development. 

Methods  Are  Yet  Crude. — Just  as  the  pioneers  enter- 
ing a  new  country  must  of  necessity  use  crude  methods 
until  explorations  are  made  and  the  needs  of  the  country 
determined,  so  the  scientists  entering  a  new  field,  with 
strange  environment,  and  with  tools  and  equipment  de- 
signed for  other  fields,  must  work  at  a  disadvantage  until 
he  has  orientated  himself  and  has  his  bearings  in  the  new 
field.  The  psychologists  are  just  beginning  to  get  their 
bearings  in  this  new  field  of  intelligence  measurements. 
They  are  beginning  to  see  that  many  of  the  implements 
used  for  measurements  in  other  fields  are  ill-adapted  to 
this  new  enterprise.  They  are  also  learning  that  certain 
other  tools  are  indispensable  in  the  explorations  and 
measurements  of  mental  traits  and  capacities.  Tests  in- 
volving merely  visual  acuity,  such  as  the  cancellation  of 
the  A's  on  a  printed  page,  for  instance,  apparently  do 
not  exercise  enough  of  this  native  endowment  to  be  of 
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much  value  in  judging  the  general  intelligence  of  an  indi- 
vidual. Hence  they  are  being  discarded  as  measures  of 
intelligence.  The  same  may  be  said  of  most  tests  func- 
tioning on  a  perceptual  level. 

Fortunately,  after  a  period  of  about  twenty  years  of 
work  in  attempting  to  measure  intelligence,  we  have  a 
symposium  by  a  group  of  thirteen  leading  American 
psychologists  on  what  intelligence  is,  how  it  may  best  be 
measured,  and  what  the  next  steps  are  in  this  field." 

Terman  in  attempting  to  answer  these  questions  in  the 
symposium  has  tersely  stated  the  situation  in  regard  to 
the  value  of  a  right  conception  of  general  intelligence. 
After  endorsing  the  conception  of  Meumann  that  we  should 
first  find  out  "what  is  demanded  of  intelligence  and  then 
analyze  the  mental  functions  which  meet  that  demand," 
he  says: 

If  we  accept  this  view  it  is  evident  that  the  important  intel- 
lectual differences  among  men  will  not  be  found  on  the  sensory, 
perceptual,  or  purely  reproductive  level.  It  is  well  known  that 
a  moron  may  be  able  to  see,  hear,  taste,  or  smell,  react  to  a 
signal,  balance  a  bicycle,  steer  an  automobile,  or  cancel  A's  about 
as  well  as  an  intellectual  genius.  The  latter  would  be  some- 
what his  superior  in  memory  for  non-sense  syllables,  would 
excel  him  more  in  logical  memory,  and  would  outclass  him  hope- 
lessly in  the  ability  to  distill  meanings  from  the  raw  products 
of  sensation  and  memory.  The  essential  difference,  therefore, 
is  in  the  capacity  to  form  concepts  to  relate  in  diverse  ways, 
and  to  grasp  their  significance.  An  individual  is  intelligent  in 
proportion  as  he  is  able  to  carry  on  abstract  thinking. 

In  answer  to  the  retort  that  some  may  make  that  he 
is  simply  singling  out  a  particular  mental  trait  for  special 
worship,  that  other  traits  are  just  as  valuable  as  abstract 
thinking,  and  that  it  is  a  kind  of  intellectual  snobbery, 

1^  " Intelhgence  and  Its  Measurements:  A  Symposium,"  Journal  of 
Educational  Psychology,  Vol.  12,  March  and  April,  1921. 
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that  holds  that  general  intelligence  is  manifested  chiefly 
by  one's  ability  to  do  abstract  thinking,  he  says:" 

Civilization  with  its  science,  art,  government,  religion,  phi- 
losophy, and  systems  of  credit,  is  unthinkable  except  as  a  product 
of  concept  elaboration  and  symbolic  thinking.  ...  It  cannot  be 
disputed  .  .  .  that  in  the  long  run  it  is  the  races  which  excel 
in  abstract  thinking  that  eat  while  others  starve,  sur\'ive  epi- 
demics, master  new  continents,  conquer  time  and  space,  and 
substitute  religion  for  magic,  science  for  taboos,  and  justice  for 
revenge.  The  races  that  excel  in  conceptual  thinking  could,  if 
they  wished,  quickly  exterminate  or  enslave  all  the  races  notably 
their  inferiors  in  this  respect.  Any  given  society  is  ruled,  led, 
or  at  least  molded  by  the  five  or  ten  per  cent  of  its  members 
whose  behavior  is  governed  by  ideas.  The  typical  pick-and- 
shovel  man  does  his  thinking  chiefly  on  sensori-motor  and  per- 
ceptual levels.  Add  a  little  more  ability  to  think  on  the  rep- 
resentative level  and  he  may  be  able  to  repair  your  automobile, 
build  you  a  house  according  to  an  architect's  specifications,  or 
nurse  you  in  illness.  Add  a  large  measure  of  ability  to  associate 
abstract  ideas  into  complex  systems  and  he  can  design  a  new 
type  of  engine,  draft  the  plans  for  a  skyscraper,  or  discover  a 
curative  serum. 

In  regard  to  the  next  step  in  research  in  the  measure- 
ment of  intelligence,  the  general  consensus  of  opinion  of 
the  men  taking  part  in  this  symposium  was  that  the 
immediate  task  is  to  refine  the  tests  and  catalogue  the 
characteristics  that  are  recognized  as  belonging  to  higher 
intelligence;  that  is,  those  elements  that  emphasize,  more 
than  any  now  existing,  deliberation  and  sustained  ra- 
tional ability.  There  must  be  a  courageous  attack  upon 
the  problem  of  measurement  of  other  than  intellectual 
factors.  The  problem  of  special  aptitudes  must  be 
attacked  and  the  general  technique  of  measurement 
improved.  While  this  programme  is  a  broad  one,  never- 
theless, the  problems  are  being  vigorously  attacked  and 
progress  is  being  made. 

"  lUd.,  pp.  127-128. 
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One  of  the  fields  that  is  receiving  special  attention  at 
the  present  time  is  the  measurement  of  the  non-intellectual 
traits  of  individuals.  The  work  of  Dr.  Downey  on  the 
''Will-Profile"  suggests  the  possibility  of  supplementing 
our  intelligence  examinations  by  objective  measures  of  the 
so-called  character -traits. 

Dr.  Haggerty  in  summing  up  the  work  of  mental 
measurements  for  the  past  year,  says:  "If  a  single  sum- 
mary phrase  were  useful  to  indicate  the  drift  of  current 
discussion,  we  might  choose  'the  inadequacy  of  intelligence' 
as  a  suitable  title. ' '  ^^ 

It  is  obvious  to  the  careful  experimenter  that  tests  of 
the  type  now  in  use  do  not  give  all  the  information  that 
we  need  to  know  about  children.  Either  the  definition 
of  intelligence  must  be  broadened  to  include  other  traits, 
or  we  must  conclude  that  there  are  traits  of  a  non-intel- 
lectual type  that  determine  to  a  large  degree  the  success 
or  failure  of  an  individual  in  life.  The  functioning  of 
the  eye  muscles  may,  for  instance,  determine  the  amount 
of  reading  an  individual  can  do,  and  thus  condition  his 
entire  life  programme. 

Industry,  which  is  apparently  beyond  the  limits  of  what 
we  call  intelligence,  has  much  to  do  with  success.  Hag- 
gerty^'' directed  an  experiment  a  few  years  ago,  the  pur- 
pose of  which  was  to  study  the  characteristics  of  50  men 
admittedly  successful  and  50  others  who  were  obviously 
failures  in  life. 

When  the  data  were  combined  for  each  of  the  successful 
men  the  result  showed  clearly  that,  in  the  combined 
opinions  of  all  the  judges,  the  quality  most  apparently 
conducive  to  success  was  industry,  which  was  defined  as 

^^  "Recent  Developments  in  Measuring  Human  Capacities,"  Journal 
of  Educational  Research,  Vol.  3,  April,  1921.  Address  delivered  before 
the  National  Association  of  Directors  of  Educational  Research  at 
Atlantic  City,  N.  J.,  March  3,  1921. 

"  Ibid.,  p.  245. 
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''thorough,  persistent,  painstaking,  enduring"  and  the 
opposite  of  "lazy,  sluggish,  mditferent,  superficial."  The 
nine  traits  ranking  next  in  order  were:  "efficiency,  atten- 
tiveness,  loyalty,  prudence,  honesty,  adaptability,  sym- 
pathy, tactfulness,  and  cheerfulness." 

The  most  of  these  traits  are  beyond  the  limits  of  what 
we  ordinarily  call  general  intelligence.  Such  non-intel- 
lectual traits  as  industry,  loj^alty,  honesty,  tactfulness, 
sympathy,  and  cheerfulness  weigh  heavily  in  favor  of 
success,  and  such  other  non-intelligent  traits  as  self- 
assertion,  pride,  conceit,  jealousy,  quarrelsomeness,  sug- 
gestibility, and  intolerance  make  their  contribution  in  the 
direction  of  failure.  Either  our  tests  of  intelligence  are 
inadequate  or  intelligence  itself  is  inadequate  to  produce 
success. 

We  have  many  cases  of  children  in  the  public  school 
with  a  high  I.Q.  who  do  very  poor  work  in  school,  and 
also  many  cases  with  a  normal  I.  Q.,  or  even  an  I.  Q.  below 
normal,  who  make  the»best  grades  in  the  class.  The  latter 
are  invariably  industrious  and  persistent.  Haggerty  re- 
ports the  case  of  a  boy  who  stood  third  in  a  class  of  60 
in  a  series  of  intelligence  tests  which  included  the  follow- 
ing: opposites,  analogies,  hard  directions,  verb-object, 
Trabue  completion,  and  the  Thorndike  Reading  Scale, 
Alpha  2.  He  was  later  examined  with  the  army  examina- 
tion A,  in  which  he  scored  325  points  placing  him  easily 
in  the  upper  10  per  cent  of  high  school  freshmen  and  the 
equal  of  many  college  students.  He  scored  179  points  on 
the  Otis  Scale;  yet  during  his  four  years  in  high  school 
only  twice  did  he  achieve  a  mark  as  high  as  C  on  a  five- 
point  scale  of  marks. 

In  contrast  with  this  student  was  a  girl  in  the  same  class 
whose  score  on  the  army  examination  A  was  231,  who 
scored  average  on  the  initial  tests  (ranking  23  in  a  group 
of  60  entering  pupils)  and  whose  I.Q.  (Terman)  was  108. 
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In  her  four  years  of  high-school  work,  only  five  times  did 
this  girl  make  a  grade  as  low  as  B  in  an  academic  subject. 
Twenty-eight  times  out  of  33  her  marks  were  A,  and  she 
was  classed  by  her  instructors  as  the  best  student  in  the 
class. 

Examples  like  those  cited  by  Haggerty  can  be  found 
in  all  large  school  systems. 

Pressey  notes  that  ''if  we  wish  to  foretell  success  in 
school  we  must  obtain  a  measure  of  school  attitude." 

Terman  insists  that  "mental  tests  should  be  supple- 
mented by  ratings  on  character  traits  and  by  educational 
tests." 

Haggerty  points  out  that  "it  is  not  at  all  probable  that 
a  perfect  measure  of  intelligence  would  give  a  perfect 
correlation  with  school  success  or  with  success  in  later  life. 
A  more  accurate  measure  of  intelligence  would  only  render 
the  inadequacy  of  intelligence  more  apparent  for  the  simple 
reason  that  success  is  not  quantitatively  coterminous  with 
intelligence  but  with  intelligence  in  combination  with  other 
significant  human  traits  not  subject  to  evaluation  by  tests 
of  the  type  currently  used  as  measures  of  intelligence."^" 

In  the  "Will-Profile"  mentioned  above,  Dr.  June 
Downey  has  attempted  to  design  a  scale  that  consists  of 
12  objective  tests  designed  to  measure  such  personal  qual- 
ities as  assurance,  flexibility,  speed  of  movement,  motor 
impulsion,  resistance,  tenacity,  coordination  of  impulses, 
freedom  from  inertia,  motor  inhibitions,  care  for  detail, 
speed  of  decision,  etc. 

The  author  claims  that  these  tests  have  considerable 
general  characterological  significance  and  that  they  can  be 
used  to  advantage  in  getting  the  general  temperamental 
pattern  of  an  individual  and  they  may  also  determine 
specific  combinations  of  traits,  and,  in  conjunction  with 

» Ibid.,  pp.  246-247. 
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intelligence  tests,  afford  in  many  situations  a  basis  for  con- 
servative prophecy. 

Healy  points  out  another  phase  of  the  measurement  of 
intelligence  that  is  worth  noting.  There  is  a  type  of  indi- 
vidual he  calls  "verbalists"  which  is  characterized  by  an 
ability  to  handle  language  above  his  ability  along  other 
lines.^^  '*0n  account  of  the  ability  of  this  type  to  handle 
language  well,  the  members  of  this  group  are  not  properly 
placed  by  the  ordinary  tests  of  social  intercourse.  The 
common  method  of  passing  judgment  on  people  is,  of 
course,  through  conversation  and  also  questions,  and  if  one 
gets  answers  that  follow  properly,  that  are  consequential 
and  coherent,  why  then  without  more  ado  one  infers  the 
answerer  to  be  practically  normal." 

It  is  interesting  to  note  that  the  city  superintendents 
and  high-school  principals  of  the  Council  of  Administra- 
tion in  the  State  of  Kansas  endorsed  a  plan  on  January 
20,  1921,  whereby  many  of  these  non-intellectual  traits  and 
traits  indicated  by  Dr.  Downey's  "Will-Profile"  are  to 
be  taken  into  consideration  in  the  awarding  of  grades  in 
the  high  schools  in  the  State  of  Kansas. 

The  definition  of  Grade  A  as  endorsed  by  that  Council 
is  given  below: 

Grade  of  A 

1.  Scholarship.  Exceeding  expectations  of  instructor. 

2.  Initiative.  Contributions  exceeding  the  assignment. 

3.  Attitude.  Positive  benefit  to  class. 

4.  Cooperation.  Forwarding  all  gToups  of  activities. 

5.  Individual  improvement.  Actual  and  noticeable. 

Educators  are  beginning  to  ask  which  is  the  more  signifi- 
cant inquiry  concerning  a  candidate  for  admission  to 
college   that   has   just  completed   a  four-year  high-school 


2^  W.  O.  Healy,  The  Individual  Delinquent  (Little,  Brown,  &  Co., 
1915),  pp.  473  et.  seq. 
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course:  ''What  subjects  has  he  previously  studied?"  or 
"What  is  his  general  intelligence  or  ability?"  The  sub- 
jects a  high-school  student  has  previously  studied  are 
significant  primarily  because  they  may  give  a  measure  of 
his  general  ability,  rather  than  because  they  indicate  a 
knowledge  of  any  particular  fact.  The  intelligence  exami- 
nation can,  therefore,  in  a  general  way,  be  valuable  in  con- 
nection with  the  selection  of  students  for  college  admis- 
sion only  as  supplementing  the  high-school  record  and  not 
as  a  substitute  for  it. 

The  question  naturally  arises  as  to  how  the  regular  room 
teacher  may  utilize  these  tests  for  more  efficient  school 
work.  We  can  speak  quite  dogmatically  and  say  that  they 
are  a  great  improvement  over  the  personal  judgment  of 
the  teachers. 

A  few  suggestions  will  be  offered  here  relative  to  the 
use  of  these  tests  by  the  regular  room  teacher. 

1.  The  teachers  should  familiarize  themselves  with  some  of 
the  standardized  intelligence  tests  now  in  use. 

2.  Care  should  be  taken  to  see  what  these  tests  are  designed 
to  test.  A  familiarity  of  the  various  traits  of  the  mind  gained 
through  the  careful  study  of  the  tests  will  quicken  the  teachers 
to  detect  particular  traits  very  early  in  the  child's  career. 

3.  If  the  children  are  given  mental  tests,  the  teacher  learns 
the  type  of  minds  with  which  she  has  to  work. 

4.  Reasonable  improvement  can  be  determined  only  after  the 
mental  capacity  of  the  individual  child  has  been  ascertained. 
If  progress  in  school  achievements  has  been  slow,  the  teacher 
has  a  good  defense  if  the  general  intelligence  is  low.  On  the 
other  hand,  if  the  children  have  high  intelligence  quotients  and 
progress  has  been  slow,  the  burden  of  the  proof  that  the  school 
was  well  tauglit  would  lie  on  the  teachtT.  Of  course,  there  are 
other  factors  that  enter  into  the  problem  that  must  be  taken 
into  consideration. 

5.  If  regular  room  teachers,  not  specially  trained  to  give  intel- 
ligence tests,  should  give  such  tests,  the  directions  for  giving  the 
tests  must  be  followed  to  the  letter  and  conclusions  and  implica- 
tions must  he  conservatively  draivn. 
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6.  Don't  treat  tests  as  "educational  curiosities."  Their  purpose 
is  to  give  information  about  the  developing  mind.  They  suggest 
educational  policies.  They  indicate  a  more  scientific  selection 
of  subject  matter, 

7.  If  we  are  to  have  better  tools  for  measuring  intelligence, 
we  must  discover  through  the  use  of  the  tools  we  now  have 
wherein  they  are  deficient. 

8.  The  spiritual  admonition  of  the  Apostle  Paul  when  he  said, 
"Prove  all  things;  hold  fast  to  that  which  is  good,"  was  never 
more  needed  in  the  field  of  religion  than  in  education. 

9.  Every  teacher  should  learn  the  principles  involved  in  test 
making  and  in  giving  tests,  not  always  for  purposes  of  diagnosis 
of  children  but  for  the  guidance  of  her  own  behavior  towards 
them. 

10.  Psychological  tests  will  enable  us  to  predict  with  a  fair 
measure  of  scientific  accuracy  the  extent  to  which  pupils  will 
avail  themselves  of  the  opportunities  which  are  set  before  them. 
They  do  not  constitute  a  psychological  clinic;  nevertheless,  they 
are  valuable  in  diagnosis.  Their  purpose  is  to  make  a  mental 
analysis  of  the  child  in  order  to  discover  the  assets  and  defects 
so  that  the  examiner  vaRj  prescribe  the  j^roper  treatment. 
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CHAPTEK  V 

THE  NEED  FOR  DEFINITE  MEASUREMENTS  OF  SCHOOL 
ACHIEVEMENTS 

Time  Consumed  in  Giving*  Examinations. — It  is  esti- 
mated that  on  an  average  each  teacher  gives  as  many  as 
twenty  examinations  each  year  and  that  it  takes  approxi- 
mately three  hours  to  give  each  examination  and  to  grade 
the  papers.^  If  we  estimate  that  there  are  600,000 
teachers  in  the  United  States,  this  would  mean  that 
36,000,000  hours  are  spent  in  giving  examinations.  There- 
fore, if  the  time  can  be  lessened  and  the  accuracy  of  the 
measures  increased,  the  effort  is  well  worth  while. 

That  the  examination  system  is  here  to  stay  is  an 
obvious  fact.  The  need  for  some  method  of  determining 
the  efficiency  of  the  educational  processes  is  so  obvious 
that  no  argument  is  needed  to  substantiate  it.  The  folly 
of  putting  children  through  educational  processes  day 
after  day  and  never  knowing  what  the  results  are  can 
be  endorsed  neither  as  a  principle  nor  as  a  matter  of 
expediency.  Schools  cannot  be  efficiently  operated  with- 
out a  system  of  examinations  any  more  than  a  business 
man  can  run  his  business  without  invoicing  his  stock 
occasionally.  The  best  known  and,  in  fact,  the  only  way 
to  determine  the  value  and  status  of  an  educational  proc- 
ess is  to  take  an  invoice  of  the  products. 

One  cannot  work  long  in  the  field  of  experimental  edu- 
cation without  coming  to  the  following  conclusions:  (1) 
Examinations  of  some  kind  are  necessary.     (2)   The  exami- 

^  "A  New  Kind  of  School  Examination,"  Journal  of  Educational 
Research,  Vol.  1,  pp.  33-46. 
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nations  must  he  of  such  a  nature  tJiat  they  may  he  given 
hy  the  regular  room  teacher.  Only  rarely  can  they  be 
administered  by  the  superintendent  in  person  or  by  some 
one  acting  for  him. 

Though  convinced  theoretically,  many  administrators 
find  it  hard  to  see  just  how  measurements  can  be  effectively 
carried  on  to  advantage  in  their  schools.  The  accusation 
is  made  that  examiners  testing  their  schools  carry  away 
the  results  and,  if  they  ever  get  back  to  the  schools,  they 
are  many  times  so  intangible  that  they  mean  nothing  as 
a  factor  in  determining  a  school  policy. 

We  must  build  our  ideal  system  of  education  syn- 
thetically, taking  the  best  methods  from  each  of  the  prev- 
alent groups  of  theories.  But  we  cannot  determine  best 
methods  until  we  measure  our  processes.  People  generallj'' 
agree  in  measuring  a  product,  if  they  can  agree  on  the 
measuring  stick. 

Measurement  in  any  department  of  natural  science  is 
the  comparing  of  a  given  magnitude  with  some  convenient 
unit  of  the  same  kind,  and  the  determination  of  how  many 
times  the  unit  is  contained  in  the  magnitude.  The  unit 
of  measurement  is  conventional.  Its  choice  is  simply  a 
matter  of  practical  convenience. 

Unfortunately,  until  quite  recently,  there  have  been  no 
objective  units  of  measurements  of  general  acceptance  in 
the  field  of  education.  A  good  teacher,  or  a  good  student, 
or  a  good  school  was  purely  a  matter  of  judgment  with 
no  commonly  accepted  measuring  device  to  verify  the 
judgment  made. 

Attitude  of  Teachers  and  Pupils  Toward  Examinations. 
— Generally  speaking,  most  teachers  and  pupils  dislike 
examinations.  Since  this  feeling  is  so  general  and  since 
so  much  has  been  said  derogatory  of  examination,  the  sub- 
ject challenges  us  to  a  careful  study  as  to  the  causes  of 
this  state  of  affairs.     Are  examinations  in  themselves  just 
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naturally  repulsive  and  obnoxious  to  pupils  and  teachers? 
Do  pupils  dislike  to  have  their  educational  achievements 
measured?  Each  of  these  questions  must  be  answered  in 
the  negative.  There  is  no  part  of  the  educational  process 
that  pupils  like  better  than  to  have  an  unbiased  statement 
as  to  their  school  achievements.  There  is  nothing  funda- 
mentally distasteful  about  an  examination  either  for  the 
teacher  or  pupil.  Yet  the  examinations  as  now  given  are 
not  satisfactory.  Since  they  are  necessary  and  are  not 
intrinsically  and  fundamentally  repulsive,  the  cause  of 
their  dislike  must  be  referred  to  one  or  more  of  the  follow- 
ing: (1)  the  methods  of  devising  the  examination;  (2) 
the  methods  of  administering  the  examination;  or  (3)  the 
methods  of  scoring  the  papers.  Improvements  along  one 
or  all  of  these  lines  will  help  to  remove  the  obnoxious 
features  from  examinations  and  make  them  a  pleasure  in- 
stead of  a  piece  of  drudgery  that  must  be  tolerated. 

School  achievement  tests,  discussed  in  this  chapter  and 
Chapters  VI  and  VII  are  nothing  more  than  improved 
examinations.  It  is  confidently  believed  by  the  writer  that 
improved  methods  of  devising  and  administering  examina- 
tions and  scoring  examination  papers,  such  as  we  shall 
have  eventually  in  the  form  of  standard  tests,  will  make 
contests  in  school  achievements  as  interesting  as  athletic 
contests  are  at  the  present  time.  There  is  no  doubt  but 
that  the  case  of  scientific  measurements  has  been  argued 
and  won.  Of  course,  the  fundamental  thing  in  school 
achievement  tests  is  not  to  provide  mere  entertainment  but 
to  devise  a  system  of  measurements  that  really  measures. 
Nevertheless,  if  the  general  attitude  of  teachers  and  pupils 
regarding  examinations  can  be  changed  so  that  they  may 
be  looked  upon  as  a  pleasure  instead  of  drudgery  as  they 
are  considered  now,  attempts  at  improving  them  are  worth 
while. 

That    both    teachers    and    pupils    need    "professional 
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hurdles,"  quantitatively  stated  goals,  set  before  them, 
against  which  their  developing  effectiveness  may  be  fre- 
quently checked  up,  is  a  recognized  principle  in  education. 
It  is  one  method  of  keeping  them  up  to  optimum  efficiency. 

The  Marking  rSystem  now  in  Vog-ue. — If  there  is  any 
one  phase  of  school  work  in  which  both  the  teachers  and 
the  public  apparently  have  had  an  abiding  faith,  it  has 
been  in  the  ability  of  teachers  to  express  definitely  and 
concisely  in  per  cent,  letters,  or  adjectives,  the  exact  prog- 
ress a  pupil  has  made  in  his  school  work.  As  evidence 
of  this  faith  we  have  allowed  these  marks  to  determine  the 
fitness  of  candidates  for  college;  their  eligibility  for 
athletics;  scholarships  and  fellowships,  which  amount  to 
hundreds  of  thousands  of  dollars  every  year,  have  been 
granted  with  practically  no  other  evidence  of  the  candi- 
date's fitness;  these  marks  have  determined  the  fitness  of 
candidates  for  civil  service  positions,  admission  to  Phi  Beta 
Kappa,  ''cum  lauda,"  "magna  cum  lauda,"  and  other 
special  honors  granted  certain  members  of  graduating 
classes  who  were  successful  in  getting  the  requisite  number 
of  A 's  or  H  's  or  other  high  marks  of  distinction. 

Parents  look  forward  with  a  great  deal  of  anxiety  to 
the  time  when  Johnny  will  bring  home,  in  the  form  of  a 
monthly  or  quarterly  report  card,  a  record  of  his  achieve- 
ments. A  fair  '^  sprinkling "  of  A's  and  B's  is  sufficient 
evidence  to  convince  the  average  father  or  mother  that 
Johnny's  progress  is  satisfactory,  so  with  a  great  deal  of 
pride,  but  little  real  information,  the  report  card  is  signed 
and  returned  to  the  teacher.  If  the  grades  were  poor  the 
parents  might  question  the  honesty  of  the  teacher  but 
never  the  reliability  of  the  marking  system. 

Grades  A,  B,  C,  and  D  usually  represent  degrees  of 
excellence  between  100  per  cent  and  60  per  cent  of  **  some- 
thing"; no  one  knows  exactly  what.  A  pupil's  "passing 
mark"  is  usually  a  grade  equivalent  to  75  per  cent  of 
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"something,"  and  if  he  gets  an  average  grade  of  95  per 
cent  of  this  "something,"  he  may  be  valedictorian  of  his 
class. 

If  school  marks  are  so  inefficient,  the  question  im- 
mediately arises  as  to  whether  some  "inventive  genius" 
or  "wizard"  has  invented  some  system  of  scales  and  units 
by  which  school  achievements  may  be  measured  with  the 
exactness  that  the  apothecary  compounds  his  medicines, 
or  the  mechanic  measures  the  diameter  of  a  piston  head. 

The  answer  to  this  question  is  perfectly  definite.  There 
are  no  scales  or  units  yet  invented  that  will  measure  school 
achievements  with  anything  like  the  exactness  that  it  is 
possible  to  get  in  the  physical  sciences.  It  should  be  noted, 
however,  that  the  precision  that  the  natural  sciences  enjoy 
was  not  developed  in  a  day.  The  sciences  have  been  evolved 
a  step  at  a  time  from  humble  beginnings  to  the  high  state 
of  perfection  they  now  enjoy.  The  Wright  brothers  did 
not  equip  their  first  aeroplane  with  a  Liberty  motor. 
Robert  Fulton's  steamboat  would  be  a  sorry-looking  spec- 
tacle beside  the  great  floating  palaces  that  now  cross  the 
Atlantic  in  three  or  four  days;  and  Stevenson's  wheezing 
little  locomotive  would  certainly  be  at  a  disadvantage  in 
competition  with  a  "twentieth  century  limited."  Each 
of  these  great  inventions  has  had  the  most  humble  begin- 
nings, and  their  evolution  may  be  traced  a  step  at  a  time 
until  the  present  state  of  efficiency  is  reached. 

In  the  same  way  we  shall  have  to  refine  our  standards 
and  units  of  educational  measurements.  There  has  been 
too  much  absolutism  in  education  and  too  little  of  a  realism 
that  sees  the  good  and  bad  in  all  and  diminishes  the  bad 
and  augments  the  good.  If  we  adopt  this  view  we  become 
really  empirical,  living  through  each  educational  experi- 
ment to  incorporate  it  into  a  growing  treasury  of  tested 
theory,  not  deducing  success  or  failure  from  metaphysical 
or  doctrinaire  prejudice. 
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It  may  not  to  too  much  to  predict  that  twenty-five  years 
from  now  our  present  scales  and  standards  will  be  as  much 
out  of  date  as  Fulton's  steamboat.  Something  like  per- 
fection will  have  been  reached  when  we  are  able  to  con- 
struct scales  by  scientific  methods  that  can  be  applied  as 
the  foot  rule  is  now  applied,  regardless  of  time,  or  place, 
or  person. 

Scientific  Measurement  of  School  Achievements  Is  New. 
— It  is  only  within  the  last  dozen  years  that  any  real 
progress  has  been  made  in  the  scientific  measurement  of 
school  products.  In  1908  Dr.  C.  W.  Stone,  a  student  under 
Thorndike,  published  his  arithmetic  test,  and  the  following 
year  Thorndike  presented  his  scale  for  handwriting  before 
the  American  Association  for  the  Advancement  of  Science, 
then  meeting  in  Boston,  These  two  tests  represent  the 
real  beginnings  of  the  scientific  measurement  of  educational 
products.  Thorndike  has  been  called  the  father  of  the 
educational  measurement  movement  in  America,  and  Dr. 
J.  M.  Rice,  mentioned  in  the  introductory  chapter,  the 
grandfather.  Inspired  by  the  work  of  Rice,  Stone,  and 
Thorndike,  Courtis  soon  followed  with  an  arithmetic  test 
in  the  four  fundamentals  which  went  through  a  number 
of  revisions  until  it  took  the  present  form  now  knowii 
to  us  as  The  Courtis  Standard  Research  Tests  in  Arith- 
metic, Series  B.  At  the  present  time  the  number  of 
standard  educational  tests  that  have  been  developed  for 
the  various  subjects  is  probably  somewhere  between  100 
and  150. 

In  spite  of  the  fact  that  practically  every  educational 
meeting  of  note  for  the  last  ten  years  has  devoted  a  greats 
deal  of  the  time  to  the  subject  of  the  scientific  measure- 
ment of  school  achievements,  and  also  the  fact  that  educa- 
tional literature  is  replete  with  discussions  of  the  new 
movement,  yet  at  the  present  time  most  teachers  accept 
the  fact  of  educational  measurements  in  a  passive  way, 


NEED   FOR   DEFINITE   MEASUREMENTS     153 

and  a  large  majority  have  never  made  use  of  these  new 
educational  tools. 

On  the  other  hand,  there  is  a  minority  group  of  teachers 
who  have  looked  upon  these  tests  and  scales  as  instruments 
for  refining  their  cruder  processes,  and  they  are  making 
rapid  strides  in  this  direction.  The  fault,  however,  does 
not  lie  altogether  with  the  teachers.  The  administrative 
machinery,  both  from  the  standpoint  of  the  state  and  local 
district,  is  organized  and  operated  on  the  old  system. 
Grades  are  still  recorded  and  promotions  made  on  the  old 
basis.  An  administrative  change  must  be  brought  about 
before  the  new  movement  has  the  complete  right  of  way. 

Those  in  the  Profession  Must  Take  the  Initiative  for 
Improvement. — Professional  improvement  must  come 
from  within  and  not  from  pressure  without  the  profession. 
The  medical  profession  has  not  advanced  because  the 
lawyer,  or  the  minister,  or  the  business  man  forced  up 
their  standards.  They  were  forced  up  by  the  leaders  in 
the  profession.  School  management  will  never  become  more 
scientific  than  the  teachers  themselves  make  it.  The  more 
teachers  that  may  be  induced  to  accept  improved  methods, 
the  sooner  the  teaching  profession  will  reach  a  stage  where 
teachers  may  be  held  morally,  if  not  criminally,  liable  for 
malpractice  just  as  they  do  in  the  medical  profession.  Be- 
fore this  can  be  done  teachers  must  be  thorouglily  con- 
vinced that  the  old  methods  are  inadequate  and  that  the 
new  ones  offer  better  opportunities  so  that  they  will  not 
hesitate  to  adopt  the  new.  Of  course  we  cannot  expect  a 
complete  transition  over  night.  Changes  have  not  been 
wrought  in  other  sciences  that  way,  and  progress  is  pretty 
much  the  same  in  all  sciences.  The  "arts  of  healing," 
for  instance,  of  our  ancestors,  the  product  of  ages  of  selec- 
tive effort,  have  given  way  to  the  modern  science  of 
medicine.  However,  some  people  still  cling  to  the  dogmas 
and  cures  of  the  older  medicine  as  to  cherished  heirlooms. 
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Since  the  transition  from  magic  and  faith  cures,  the  med- 
ical profession  has  become  so  strongly  entrenched  by  the 
accumulation  of  scientific  facts  that  it  yields  to  no  social 
force.  Even  the  military  takes  orders  from  these  men  of 
science. 

There  are,  of  course,  new  fields  being  explored  where  it 
is  still  a  matter  of  opinion  as  to  what  procedure  is  best; 
but  along  with  these  there  is  a  procedure  in  some  of  the 
more  familiar  fields  that  is  so  definite  that  any  departure 
from  its  makes  the  practitioner  liable  to  a  fine  for  mal- 
practice. A  more  profound  insight  into  scientific  methods 
and  a  greater  sensitiveness  of  our  shortcomings  are  the 
twin  forces  that  are  operative  in  convincing  men  that 
human  incapacity,  suffering,  and  waste  can  be  reduced  and 
life  made  better  through  more  purposeful  use  of  scientific 
knowledge  as  it  becomes  more  accessible. 

Purposes  of  Educational  Tests  Are  Not  Generally 
Understood. — The  work  in  educational  reform  has  been 
hindered  a  great  deal  by  a  lack  of  understanding  as  to 
what  the  purposes  of  tests  are.  Monroe  cites  the  case  of 
a  school  superintendent  in  a  city  of  more  than  50,000 
population  stating  that  he  did  not  believe  tests  had  much 
merit.  His  reasons  were  that  he  had  given  the  Courtis 
Standard  Research  Tests  in  Arithmetic  and  the  children 
did  not  do  any  better  after  the  tests  were  given  than 
they  did  before.  Just  as  though  standard  tests  were 
teaching  devices  that  would  improve  their  arithmetic 
ability.  That  is  analogous  to  the  case  of  a  mother  who, 
being  anxious  for  her  baby  to  gain  in  weight,  said  that 
she  would  not  weigh  her  baby  any  more  because  weigh- 
ing it  from  time  to  time  did  not  increase  its  weight. 

The  Problem  to  Be  Solved. — The  first  problem  that  con- 
fronts one  attempting  to  measure  school  achievements  is 
that  of  finding  some  convenient  units  of  measure  to  apply 
to  the  achievement  in  question.     If  one  wants  to  know 
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the  length  of  a  board  or  the  width  of  the  street,  there 
are  several  convenient  units  with  which  to  measure  them. 
They  may  be  measured  in  feet,  inches,  yards,  meters, 
centimeters,  or  other  derived  units.  If  one  wants  to  know 
how  heavy  a  thing  is,  there  are  well-defined  units  at  his 
disposal  such  as  the  ounce,  pound,  gram,  and  kilogram. 
Temperature  is  measured  in  degrees,  electricity  in  kilo- 
watt-hours, etc. 

When  one  attempts  to  measure  the  handwriting  of  an 
individual,  or  how  well  he  can  draw  or  spell  or  how  well 
he  can  read  and  cipher,  there  are  no  such  convenient 
units  at  his  disposal.  One  of  the  first  problems  to  be 
solved,  therefore,  by  those  desiring  to  measure  school 
achievements  is  to  devise  scales  and  units  that  may  be 
used  in  measuring  these  products.  In  linear  measure  and 
measures  of  weight,  the  steps  on  the  scale  are  the  same, 
that  is,  the  difference  between  1  pound  and  2  pounds  is 
the  same  as  the  difference  between  21  pounds  and  22 
pounds,  or  between  48  and  49  pounds.  The  question 
immediately  rises  as  to  whether  scales  for  measuring  edu- 
cational products  may  be  made  and  used  as  the  above- 
mentioned  scales  are  used  in  other  sciences. 

What  School  Achievement  Tests  Measure. — The  first 
thing  to  be  noted  is  what  a  test  measures.  Professor 
Courtis  has  given  us  three  terms  which  are  helpful  in  bear- 
ing this  thought  in  mind.  These  terms  are:  (1)  capacity, 
or  the  original  intellectual  endowment  of  the  educand  when 
he  enters  school;  (2)  ability,  which  is  capacity  plus  train- 
ing; and  (3)  performance,  or  that  which  the  individual 
actually  does  on  a  given  test.  It  is  a  rare  case  when  one's 
ability  is  equal  to  his  capacity.  That  is,  the  native  endow- 
ment is  such  that,  generally  speaking,  it  might  have  been 
cultivated  into  more  ability.  It  is  also  true  that  perform- 
ance is  usually  less  than  ability.  It  is  a  rare  case  when 
one  is  taking  a  test  that  all  the  ability  is  utilized.    When 
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an  eighth-grade  boy  takes  a  test  in  addition,  for  instance, 
and  works  as  hard  and  rapidly  as  he  thinks  it  is  possible, 
he  could,  no  doubt,  add  another  problem  to  his  credit  if 
the  reward  offered  were  made  large  enough.  In  school 
achievement  tests,  we  are  testing  performance  and  not 
capacity  or  ability.  It  should  be  noted  that  the  word 
"performance"  is  used  here  in  a  different  sense  from 
that  used  in  discussing  tests  of  intelligence.  There  it  was 
used  in  a  somewhat  restricted  sense,  meaning  activities 
which  do  not  depend  on  the  ability  to  read. 

Experimental  Evidence  to  Shovy  that  School  Marks  Are 
Inadequate. — As  was  indicated  in  the  introductory  chap- 
ter, one  of  the  first  things  that  must  be  done  to  make 
measurements  of  educational  products  more  nearly  exact 
is  to  convince  teachers  that  the  old  marks  now  being  used 
are  inadequate.  It  seems  fair  to  assume  that  with  an 
efficient  measuring  device  or  yardstick  any  number  of 
competent  persons  measuring  the  same  thing  ought  to  get 
approximately  the  same  results.  It  is  not  expected  that 
the  results  will  be  exactly  the  same  because  exact  measure- 
ments are  impossible  in  any  field.  If,  for  instance,  116 
competent  persons  were  given  a  measuring  stick  and  asked 
to  measure  the  length  of  a  board  the  length  of  which  was 
known  to  be  somewhere  between  0  and  100  inches,  and, 
if  the  results  obtained  varied  from  28  to  92  inches,  we 
would  conclude  that  there  was  something  wrong  witli  the 
measuring  device.  If  the  measuring  were  done  with  a  foot 
rule  divided  into  inches,  a  variation  of  an  inch  or  two 
would  be  the  maximum  we  would  expect. 

Starch  and  Elliott  investigated  the  accuracy  with  which 
116  teachers  were  able  to  measure  the  merits  of  a  geometry 
examination  paper.  A  facsimile  reproduction  of  the 
examination  paper  was  made  and  sent  to  each  of  the  high 
schools  included  in  the  North  Central  Association  of 
Colleges  and  Secondary  Schools  with  the  request  that  the 
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geometry  teachers  in  the  various  schools  mark  it  on  a 
basis  of  100  per  cent.  One  hundred  and  sixteen  teachers, 
in  as  many  schools,  complied  with  the  request.  The  grades 
ranged  from  28  to  92  per  cent.  The  conditions  are 
analogous  to  those  mentioned  above  in  measuring  the  board. 
The  supposition  is  that  the  teachers  asked  to  measure  the 
merits  of  the  paper  are  competent  persons  or  they  would 
not  have  been  employed  to  teach  mathematics  in  the  high 
schools.  Another  factor  that  deserves  special  attention  is 
the  fact  that  a  geometry  examination  paper  is  supposed 


28  53   55  60  65  70  75  80  65  90 

Figure  II.    distribution  op  marks  assigned  to  one  geometry  paper 
BY  116  teachers 

to  be  of  such  a  nature  that  it  may  be  graded  with  a  much 
higher  degree  of  accuracy  than  an  examination  in  litera- 
ture, history,  or  reading,  for  instance.  Figure  II  shows  the 
distribution  of  marks  as  they  were  compiled  from  the  re- 
ports of  this  test.  Two  of  the  marks  were  above  90  while 
one  was  below  30.  Thirteen  teachers  gave  the  paper  a  grade 
of  75  per  cent.  Twenty-seven  marks  were  above  80  while 
20  others  were  below  60.  Considering  a  grade  of  75  per 
cent  as  a  "passing  mark,"  47  of  the  116  teachers  con- 
sidered the  paper  to  have  sufficient  merit  "to  pass,"  while 
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69  thought  the  paper  not  worthy  of  a  passing  mark.  This 
gives  a  range  of  more  than  60  per  cent.  Two  or  three 
inches  would  be  the  maximum  variation  we  would  expect 
in  the  case  of  measuring  the  board  above  mentioned. 

The  case  of  measuring  the  board  would  have  been  more 
nearly  analogous  to  that  of  measuring  the  examination 
paper  if  instead  of  using  a  rigid  foot  rule  we  had  used  a 
piece  of  elastic.  In  that  case  it  would  all  depend  on  the 
tension  of  the  elastic  as  to  what  results  were  obtained.  It 
is  as  difficult  even  for  the  same  teacher  to  grade  a  paper 
twice,  with  a  sufficient  time  interval  so  that  she  does  not 
remember  her  former  grade,  and  give  it  the  same  score 
each  time,  as  it  would  be  to  make  two  measurements  of 
a  board  with  a  piece  of  elastic  and  get  identical  results 
because  she  forgets  what  the  former  tension  of  the  elastic 
was. 

Teachers'  unaided  judgments  are  as  elastic  as  the  piece 
of  rubber,  and  the  purpose  of  the  new  tests  and  standards 
is  to  give  the  marks  some  fixity  so  they  will  be  more  nearly 
constant.  If,  by  these  new  standards,  the  variation  can 
be  reduced  by  even  one-half,  so  that  in  the  case  of  the 
geometry  paper  instead  of  having  a  variation  in  the  scores 
of  more  than  60  per  cent  (from  28  to  92,)  it  may  be  re- 
duced to  30  per  cent,  or  even  less,  a  wonderful  change  for 
the  better  will  have  been  wrought. 

Starch  and  Elliott  not  only  investigated  the  ability  of 
geometry  teachers  to  mark  geometry  papers  but  made 
similar  tests  in  history  and  English.  In  all  three  cases 
the  variations  were  very  great.  Tlie  papers  in  each  case 
were  presumedly  graded  by  experts  in  the  various  fields. 

Marking-  System  Inefficient  because  It  Does  Not  Indi- 
cate Progressive  DegTees  of  Merit. — If  one  were  asked  to 
grade  a  composition,  one  of  the  first  things  he  would  want 
to  know  would  be  what  age,  grade,  or  class  of  pupil  wrote 
it.    If  told  it  was  written  by  a  pupil  in  the  second  grade. 
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a  boy  six  or  seven  years  old,  it  might  get  a  mark  of  95 
or  100  per  cent,  but  if  told  that  it  was  written  by  an 
eighth-grade  pupil,  a  mark  of  60  per  cent  might  be  given. 
Now  it  is  evident  that  the  composition  has  the  same  merit 
whether  written  bj'  a  second-grade  pupil  or  a  college 
graduate.  Instead  of  having  one  scale  for  grading  com- 
positions the  person  grading  the  paper  uses  a  different 
scale  for  every  grade  in  school.  A  boy  may  get  95  per 
cent  in  his  composition  work  from  the  first  grade  to  the 
time  he  graduates  from  college.  Since  it  is  obvious  that 
he  can  write  a  better  composition  in  college  than  he  can 
when  he  is  only  a  second-grade  pupil,  there  should  be  some 
way  of  expressing  in  definite  units  of  accomplishment  the 
advancement  made. 

Definition  of  a  Scale. — It  is  to  correct  this  defect  that 
educational  scales  are  being  devised.  The  word  "scale" 
comes  from  the  Latin  word  scala,  which  literally  means  a 
ladder,  or  a  staircase,  a  long  straight  article  divided  into 
a  series  of  equal  steps  and  readily  lending  itself  to  use 
as  a  measure.  In  the  physical  sciences  the  scale  reaches 
from  the  lowest  to  the  Jiighest.  A  scale  for  measuring 
weight  reaches  from  the  lowest,  or  smallest  amount  we 
desire  to  measure,  to  the  highest,  or  greatest.  A  scale  for 
measuring  length  or  time  reaches  from  zero  to  some  upper 
limit  as  high  as  we  desire  to  go.  As  used  in  education 
it  is  thought  of  as  a  linear  rule  extending  from  the  worst 
to  the  best ;  or  from  an  educational  product  of  least  merit 
to  one  of  greatest  merit ;  from  problems  of  least  difficulty 
to  those  of  greatest  difficulty,  and  so  on.  Measurement 
is  in  terms  of  relative  merit,  showing  whether  a  given 
product  or  performance  represents  an  achievement  that  is 
halfway  along  the  scale  from  worst  to  the  best  or  at  a 
point  80  per  cent  of  the  distance  from  the  zero  point  to 
the  high  end,  or  some  other  definite  location. 

We  are  apt  to  ignore  the  relative  worth  of  things.  When 
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we  attack  a  standard  in  school,  we  set  absolute  perfection 
as  our  own  standard.  We  fail  to  consider  the  fact  that 
the  final  increment  of  any  subject  is  frequently  obtained 
at  an  expenditure  of  energy  out  of  all  proportion  to  its 
worth.  Scales  will  help  in  discovering  what  it  costs 
in  time,  energy,  and  practice  to  reach  the  various  standards 
set.  We  shall  expect  that  curricula  may  be  constructed 
on  the  basis  of  aims  or  objectives  that  are  scientifically 
determined  and  are  not  the  chance  product  of  personal 
preference  and  opinion.  The  function  of  the  scale  is  to 
take  the  score  resulting  from  the  test  and  interpret  it  in 
terms  of  relative  merit. 

The  test  is  analogous  in  many  respects  to  the  pea,  or 
sliding  weight,  on  the  scale  beam  and  is  used  to  determine 
the  exact  location  of  the  individual  on  the  scale.  IMaking 
a  test  too  easy  would  not  determine  the  position  of  an 
individual  on  the  scale  any  more  than  setting  the  pea  at 
10  pounds  would  determine  the  weight  of  a  body  weigh- 
ing approximately  100  pounds. 

Just  as  one  estimates  the  weight  of  an  article  by  its  size 
or  by  lifting  it  and  sets  the  pea  at  the  estimated  weight 
on  the  scale  beam,  so  the  test  must  be  arranged  so  that 
the  point  reached  on  the  educational  scale  will  lie  within 
the  range  of  the  test.  If,  for  instance,  aU  the  problems 
are  solved  in  a  speed  test  in  arithmetic  before  time  is  called, 
one  would  not  know  whether  the  upper  limit  of  a  child's 
ability  had  been  reached  any  more  than  he  would  Imow 
the  weight  of  a  thing  if  he  were  to  set  the  pea  at  100 
pounds  on  the  scale  beam  and  did  not  move  it  above  or 
below  that  mark  to  see  jf  the  article  would  weigh  more  or 
less  than  100  pounds. 

Progress  from  grade  to  grade  cannot  be  measured 
accurately  if  the  per  cent  system,  as  now  used,  is  to  be 
employed.  The  Thorndike  Handwriting  Scale,  while  in- 
efficient in  many  ways,  possesses  the  essential  character- 
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istic  of  indicating  degrees  of  merit  from  the  lowest  to 
almost  perfection.  The  various  specimens  of  handwriting 
start  with  Quality  4,  which  is  recognized  as  handwriting 
but  almost  entirely  illegible,  and  go  to  Quality  IS,  which 
is  well  on  the  way  towards  perfection.  The  quality  of  a 
specimen  of  handwriting  is  determined  by  sliding  the 
specimen  to  be  measured  along  the  scale  until  that  quality 
is  reached  which  most  resembles  the  sample  in  question. 
The  age  or  grade  of  the  writer  has  nothing  to  do  with  the 
quality.  An  eighth-grade  boy  and  a  first-grade  boy  may 
write  specimens  of  exactly  the  same  quality. 

An  Ideal  Scale  Must  Have  Equal  Steps  and  Each  Step 
Must  Bear  a  Definite  Relation  to  the  Zero  Point. — It  was 
indicated  in  the  introduction  that  our  present  marking 
system  follows  no  known  rules  of  mathematics.  A  grade 
of  80  per  cent  does  not  mean  that  the  product  is  twice  as 
good  as  that  given  a  grade  of  40  per  cent,  and  75  per  cent 
is  not  one  and  a  half  times  as  good  as  a  grade  of  50  per 
cent,  because  there  is  no  zero  point  established  to  which 
these  marks  bear  a  definite  relation.  Our  new  educational 
scales  are  built  on  the  same  plan  as  the  scales  for  length 
and  weight  in  the  physical  sciences.  In  order  to  obtain 
this  characteristic  of  having  successive  steps  on  the  scale 
indicate  progressive  values  increasing  by  equal  amounts, 
the  new  scales  are  based  on  the  theory  of  the  so-called 
normal  distribution  of  intellectual  ability.  This  distribu- 
tion is  represented  graphically  by  a  bell-shaped  curve,  or 
it  is  like  the  cross-section  of  a  pile  of  sand  dumped  from 
a  cart.  The  following  illustrations  of  a  normal  distribu- 
tion surface  will  help  clarify  our  conception  as  to  how  the 
new  educational  scales  are  made. 

If  one  were  to  pluck  all  the  leaves  from  an  oak  tree 
and  arrange  them  in  a  long  line  according  to  their  lengths 
so  that  the  shortest  leaves  would  be  at  the  left  end  of  the 
line  and  the  longest  ones  at  the  right,  he  would  find  that 
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he  had  very  few  exceptionally  short  leaves,  a  great  number 
of  medium  length,  and  very  few  exceptionally  long  ones. 
If  these  were  represented  by  a  surface  of  frequency,  the 
diagram  would  be  the  bell-shaped  curve  mentioned  above 
(see  Fig.  I,  page  77).  Now,  if  the  number  of  leaves  of 
various  lengths  were  represented  by  the  height  of  the  curve 
above  the  base  line,  at  the  extreme  left  the  curve  would 
come  very  close  to  the  base  line  because  there  are  very 
few  leaves  exceptionally  short.  As  the  leaves  grow  pro- 
gressively longer  their  number  also  increases,  and  the  curve 
rises  rapidly  from  the  base  line,  first  concave  then  convex, 
until  the  peak  is  reached.  It  then  descends  toward  the 
base  line  making  a  symmetrical  curve. 

If  10,000  adult  males,  all  belonging  to  the  same  race, 
were  chosen  at  random  and  lined  up  in  the  order  of  their 
heights,  the  short  ones  at  the  left  end  of  the  line  and  the 
tall  ones  at  the  right,  their  distribution  would  approximate 
very  closely  that  mentioned  above  in  reference  to  the  leaves. 
That  is,  there  would  be  a  few  exceptionally  short  ones,  a 
great  many  of  medium  height,  and  very  few  giants.  The 
weights  of  individuals,  the  strength  of  their  grip,  and,  in 
fact,  most  of  their  physical  traits  are  found  to  obey  the 
same  law. 

The  question  then  arises  as  to  whether  the  intellectual 
traits  of  individuals  distribute  themselves  in  the  same  way. 
It  has  been  found  that  if  tests  are  given  in  any  grade  as 
the  fifth,  for  instance,  in  the  subject  of  spelling,  a  few 
exceptionally  poor  spellers  are  found,  a  large  group  of 
spellers  with  median  ability,  and  a  very  small  group  of 
exceptionally  good  spellers.  The  same  distribution  is 
found  if  any  considerable  number  of  children  in  any  grade 
are  tested  in  any  subject.  In  other  words,  the  same  law  that 
operates  in  the  physical  and  biological  sciences  is  also 
operative  in  reference  to  intellectual  traits. 

The  problem  of  scientific  scale  building  is  further  sim- 
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plified  by  the  fact  that  mathematical  laws  concerning  the 
characteristics  of  the  surface  of  normal  distribution  have 
been  most  accurately  determined,  and  by  the  application 
of  those  laws  it  is  possible  to  locate  and  determine  the  steps 
on  the  different  scales  for  school  achievement. 

Now  since  it  is  found  that  intellectual  traits  distribute 
themselves  according  to  a  normal  or  symmetrical  distri- 
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Figure  III.  above,  distribution  of  marks  in  English  and  history 
in  the  university  high  school  of  the  university  of  chicago. 
Below,  distribution  of  marks  of  two  teachers  in  the  same 
DEPARTMENT  {after  Johnson). 

bution,  if  one  were  to  examine  the  grades  given  by  teachers 
and  found  they  were  distributed  in  some  other  way  than 
according  to  a  normal  distribution  of  frequency,  it  would 
be  perfectly  legitimate  to  question  the  marking  system. 

An  investigation  of  this  kind  was  made  by  F.  W.  John- 
son, principal  of  the  University  High  School  of  the  Uni- 
versity of  Chicago  of  the  grades  given  for  the  school  years 
1907-8  and  1908-9.     Figure  III  shows  the  distributions 
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of  the  grades.^  It  may  be  seen  that  the  distribution  of 
grades  here  do  not  conform  to  a  normal  distribution  sur- 
face. 

Another  investigation  which  throws  some  light  on  the 
inadequacy  of  school  marks  is  that  made  by  Dr.  F.  J. 
Kelly.  In  1913,  Dr.  Kelly  made  an  investigation  in  four 
ward  schools  in  Hackonsack,  N.  J.  to  determine  the  marks 
given  sixth-grade  children,  and  compared  these  marks  with 
those  received  when  they  went  to  a  common  departmental 
school  for  seventh  grade  work.  A  subject  such  as  arith- 
metic, which  was  taught  in  the  ward  schools  by  four 
teachers,  is  now  taught  by  one.  That  is,  by  the  depart- 
mental plan  one  teacher  had  the  arithmetic,  another  the 
language,  a  third  the  history,  and  so  on.  This  gave  an 
opportunity  to  check  up  the  grades  and  find  out  if  a  "G" 
(Good)  in  one  ward  school  meant  the  same  as  a  "G"  in 
another.  Since  all  the  ward  schools  were  under  the  same 
management,  being  a  part  of  one  system,  we  would  expect 
a  "G"  in  one  school  to  mean  the  same  as  a  ''G"  in  an- 
other.    This  condition,  however,  did  not  obtain. 

Kelly  found  that  for  work  which  the  teacher  in  school 
"C"  (one  of  the  ward  schools)  would  give  a  mark  of  "G" 
(Good)  in  language,  penmanship,  or  history,  the  teacher 
in  school  ''D"  (another  ward  school)  would  give  less  than 
a  mark  "F"  (fair).^ 

More  Exact  Measurements  Will  Make  Education  a 
Science. — The  application  of  scientific  measurements  to 
school  products  is  doing  more  to  make  education  a  science 
than  any  other  contributing  cause.  It  is  giving  education 
tools  with  which  to  work.  In  this  field  we  are  making 
wonderful  strides.  The  soil  is  virgin;  hence  it  will  take 
educators  a  long  time  to  put  it  on  a  plane  with  the  other 

^School  RcHew,  Vol.  19,  pp.  13-24. 

'  F.  J.  Kelly,  Teachers'  Marks,  Teachers  College  Contribution  to 
Education,  No.  66,  1913,  p.  7. 
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sciences.  For  instance,  a  few  years  ago  a  reading  test 
seemed  impossible;  to-day,  we  have  mastered  the  distinc- 
tion between  oral  and  silent  reading.  We  have  good 
methods  of  measuring  some  of  the  more  common  types  of 
deficiencies,  and  we  know  the  rate  of  progress  which  is 
normal  in  the  more  obvious  phases  of  interpretation.  The 
advantages  of  tests  are  in  the  same  line  as  the  advantages 
of  the  thermometer  over  saying,  ''The  heat  is  stifling,"  "It 
is  very  hot, "  "It  is  boiling  hot, ' '  etc. ;  or  of  saying,  ' ' He 
is  the  tallest  person  I  ever  saw,"  or  "the  biggest  in  the 
state  of  Massachusetts." 

It  may  not  be  too  much  to  predict  that  some  day  concen- 
tration of  attention,  ability  to  attack  various  kinds  of  prob- 
lems, clearness  of  insight,  power  of  inference  in  various 
fields,  and  other  abilities  will  be  measured.  An  educational 
product,  as  a  composition,  is  usually  a  complex,  and  its 
measurement  is  more  like  measuring  a  house  or  an  elephant 
than  measuring  a  length  or  a  volume.  It  must  not  be 
expected,  therefore,  that  a  thing  may  be  completely 
described  quantitatively  when  it  is  measured  once. 

Supervision  Improved  as  Ability  to  Measure  Increases. 
— Standard  tests  will  greatly  improve  our  supervision. 
Our  children  will  be  more  rapidly  and  effectually  taught. 
Inefficient  teachers  will  no  longer  be  able  to  hide  behind 
our  ignorance.  Dubious  aims  of  education  and  aims  too 
remote  to  be  effective  to  the  general  practitioner  will  be 
replaced  by  goals  that  are  in  sight,  by  a  motive  which  is 
dynamic  and  energizing,  and  by  an  appeal  which  will 
spur  pupils  on  to  a  greater  effort.  The  pupil  wants  to 
know  how  he  is  getting  along.  He  wants  to  know  where 
he  stands  in  reference  to  an  impartial  standard.  He  wants 
to  know  if  his  performance  each  succeeding  day  and  week 
brings  him  a  little  nearer  the  coveted  goal.  The  teacher 
is  eager  to  know  what  degree  of  success  her  efforts  have 
won  as  measured  by  the  quality  of  the  work  done  by  her 
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pupils.  It  must  be  borne  in  mind,  of  course,  that  an  edu- 
cational product  is  a  complex.  It  is  the  resultant  of  a 
great  many  causes,  of  which  the  school  is  but  one.  The 
home,  the  native  capacity,  the  amount  of  study  done  by 
the  pupil,  the  influence  of  the  street,  the  playground,  the 
state  of  health  of  the  educand,  and  many  other  factors 
enter  into  this  complex.  To  isolate  and  measure  the  school 
factor  is  a  difficult  task  to  perform.  One  must  be  very 
careful  about  drawing  conclusions  too  hastily  when  schools 
are  compared.  Suppose  for  instance,  school  A  were  com- 
pared with  school  B  in  the  four  fundamentals  of  arith- 
metic and  the  median  score  for  school  A  was  three  prob- 
lems more  than  school  B.  Teachers  must  not  draw  the 
conclusion  that  the  difference  was  caused  exclusively  by 
better  teaching  in  school  A.  It  may  be  that  more  time 
was  devoted  to  the  four  fundamentals  in  arithmetic  in 
school  A  than  school  B.  Or  the  native  capacity  of  chil- 
dren in  the  former  school  may  have  been  much  better 
than  that  in  the  latter;  they  may  have  studied  harder; 
they  may  have  received  more  help  and  encouragement  from 
home;  the  general  status  of  health  may  have  been  better 
than  in  school  B.  Along  with  these  factors  may  have  been 
the  fact  that  school  A  had  better  teachers  and  better  teach- 
ing methods  than  school  B.  Or,  the  opposite  may  have 
been  true.  Teaching  methods  cannot  be  compared  in  this 
way.  If  one  wanted  to  compare  the  merits  of  two  methods 
of  teaching  a  certain  subject,  as,  for  instance,  the  auditory 
and  the  visual  methods  of  teaching  spelling,  the  teaching 
factor  may  be  isolated  and  measured  with  a  fair  degree  of 
accuracy  in  the  following  manner :  Take  two  classes  from 
the  same  school,  or  from  different  schools,  which  appar- 
ently have  the  same  spelling  ability  as  measured  by  some 
spelling  scale  such  as  the  Ayres  Spelling  Scale.  Combine 
the  study  period  with  the  recitation  period  and  have  one 
class  study  spelling  by  the  visual  method,  where  the  teacher 
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writes  the  words  on  the  blackboard  or  has  some  other 
way  of  presenting  them  visually.  In  the  other  class  the 
same  words  are  to  be  studied,  but  all  work  is  to  be  done 
orally  and  the  children  are  not  to  see  the  words  either 
in  script  or  print  for  purposes  of  study.  No  work  is  to 
be  done  on  the  spelling  lessons  outside  the  classroom.  At 
the  end  of  one  month  a  test  may  be  given  and  the  merits 
of  the  two  methods  determined.  These  conditions  would 
be  about  as  nearly  controlled  conditions  as  one  could  get 
in  ordinary  school-room  procedure.  Even  here  the  differ- 
ence in  native  capacities  of  the  two  classes,  the  difference 
in  the  two  teachers,  if  two  were  employed,  the  difference 
in  application  of  the  pupils  and  many  other  factors  would 
condition  the  results  to  some  extent. 

Our  Educational  Scales  Have  Been  Subjective. — The 
problem  for  scientific  education  is  to  make  our  educa- 
tional scales  objective  and  universal.  They  must  be  so 
constructed  that  there  can  be  no  misunderstanding  about 
them  when  they  are  placed  in  the  hands  of  competent 
teachers.  Educational  scales  make  it  possible  to  define 
educational  products  far  more  precisely  than  without 
them.  Just  as  in  the  physical  sciences  it  is  far  more 
nearly  exact  to  say  the  temperature  is  106  degrees 
Fahrenheit  than  it  is  to  say,  "It  is  the  hottest  day  I  ever 
saw,"  or  **The  day  is  boiling  hot,"  or  "It  is  an  awful 
hot  day,"  so  in  education  we  are  refining  our  cruder 
terms  and  making  them  more  definite.  When  an  indi- 
vidual says  it  is  the  hottest  day  he  ever  saw,  no  one  but 
the  speaker  knows  what  is  meant,  but  when  he  says  it 
is  106  degrees  Fahrenheit  any  intelligent  individual  in 
any  part  of  the  world  understands  exactly  what  is  meant. 
The  expression,  "the  hottest  day  I  ever  saw,"  is  as 
definite,  however,  as  the  85  per  cent  the  teacher  gives  a 
boy  in  penmanship,  because  no  one  knows  but  that 
teacher  what  it  means.    Before  Thorndike  made  his  hand- 
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writing  scale  educators  were  in  the  same  condition  with 
respect  to  handwriting  as  scientists  were  in  respect  to 
temperature  before  the  discovery  of  the  thermometer.  In 
that  day  it  was  not  possible  to  measure  ordinary  tem- 
perature beyond  the  cold,  cool,  warm,  hot,  and  very  hot, 
of  subjective  opinion. 

We  can  easily  measure  the  salaries  of  teachers  because 
we  have  a  scale  of  money  price.  We  can  measure  the 
amount  of  time  given  by  teachers  because  we  have  the 
best  scale  that  the  world  knows,  the  scale  of  time  divided 
into  seconds,  minutes,  hours,  etc.  The  most  abstract  thing 
in  the  world  is  a  scale  for  length,  weight,  time,  or  units 
of  temperature.  As  little  as  w^e  care  about  scales  they 
are  among  the  most  important  things  in  the  world.  The 
skill  and  courage  of  the  most  daring  seamen  that  ever 
traveled  the  seas,  millions  of  them  put  together,  do  not 
do  as  much  for  the  practice  of  navigation  as  the  mariner's 
compass  which  is  simply  a  scale  for  telling  direction. 
While  tests  limit  the  "spread-out-ness"  of  our  units  of 
measurements,  they  do  not  give  us  a  definite  point  on 
the  scale.  They  do,  however,  give  us  narrow  and  definite 
boundary  lines  within  which  the  measures  lie. 

Tests  Do  Not  Indicate  the  Cause  of  Conditions. — A 
school  achievement  test,  or  any  other  kind,  does  not  indi- 
cate what  brought  about  the  conditions  found.  It  simply 
says  that  a  certain  state  of  affairs  exists.  When  a  physi- 
cian's thermometer  shows  a  temperature  of  103°  it  does 
not  indicate,  in  the  least,  whether  the  high  temperature 
is  caused  by  typhoid  fever,  influenza,  diphtheria,  or  what 
not.  It  simply  says  that  the  patient's  temperature  is  103°. 
This  is  a  result  of  perhaps  many  causes,  and  further 
diagnosis  is  necessary  to  determine  what  brought  about 
this  condition.  In  exactly  the  same  way  a  score  made 
in  any  subject  is  the  resultant  of  many  causes,  of  which 
teaching  may  be   one.     Teachers  must  use    caution    in 
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assigning  conditions  to  definite  causes  unless  it  is 
absolutely  known  that  the  condition  was  brought  about 
by  the  cause  assigned. 

How  Standard  Tests  Differ  from  Ordinary  Examina- 
tions.— Perhaps  one  of  the  best  ways  to  show  the  differ- 
ence between  a  standard  test  and  an  ordinary  examina- 
tion is  to  note,  in  a  general  way,  the  steps  in  making  a 
standard  test.  To  the  uninitiated,  school  achievement 
tests  look  so  much  like  ordinary  examinations  that 
teachers  often  wonder  what  the  advantages  of  the  stand- 
ard tests  are  over  the  examinations. 

The  superiority  of  the  standard  test  over  the  ordinary 
examination  may  be  shown  best,  perhaps,  by  indicating 
the  problems  in  making  a  standard  test.  The  procedure 
is  somewhat  as  follows :  The  first  thing  to  be  determined 
is  the  kind  of  a  test  desired ;  that  is,  shall  the  test  cover 
several  phases  of  the  subject,  or  one?  Shall  it  be  a 
rate  test,  telling  how  much  a  pupil  can  do  in  a  given 
time,  or  shall  it  be  a  difficulty  test  answering  the  ques- 
tion "How  hard  a  problem  can  a  child  solve?"  Or  shall 
it  be  a  quality  test  answering  the  question  "How  well 
can  a  child  do  a  given  task?"  When  the  type  of  test 
has  been  chosen,  then  those  making  the  test  must  decide 
upon  the  following  points:  Shall  the  test  cover  several 
parts  of  a  subject  as,  for  instance,  the  four  fundamentals, 
common  and  decimal  fractions,  and  percentage  in  arith- 
metic, or  shall  it  be  diagnostic  and  cover  quite  completely 
just  one  phase  of  the  subject,  as  addition  or  division? 
Or,  if  the  subject  is  language,  shall  the  test  cover  in  a 
general  way  all  parts  of  speech,  or  shall  it  cover  just 
verbs  or  pronouns?  This  having  been  decided  the  next 
point  to  determine  is  the  principles  on  which  the  test 
questions  shall  be  based.  For  instance,  if  a  pronoun  test 
is  being  made,  shall  the  choosing  of  the  test  questions 
bear  some  definite  relation  to  the  type  of  errors  people 
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make  in  the  use  of  the  pronoun  forms,  or  shall  something 
else  be  taken  as  a  basis  for  choosing  the  questions?  If 
more  errors  are  made  in  the  use  of  some  pronoun  forms 
than  in  others,  shall  there  be  more  test  questions  on  these 
particular  forms,  assuming  that  the  test  questions  have 
been  selected  on  some  such  basis  as  the  above? 

The  next  questions  to  decide  are  hov/  many  questions 
or  parts  shall  go  in  the  test  and  approximately  how  much 
time  should  it  take  to  do  the  test.  The  number  of  ques- 
tions or  divisions  is  determined  in  part  by  the  subject 
matter  itself,  and  in  part  by  arbitrarily  fixing  the  length 
of  the  test.  The  approximate  length  of  the  test  having 
been  determined,  the  next  thing  is  the  actual  drafting 
of  the  questions  or  parts  and  the  submitting  of  these 
parts  to  a  representative  group  of  children  usually  from 
2,000  to  5,000  in  number.  This  is  called  the  preliminary 
test  upon  which  the  standardized  test  may  eventually  be 
based.  It  may  be,  however,  that  the  preliminary  test 
will  show  that  the  mode  of  attack  is  not  good  and  the 
whole  thing  must  be  ''scrapped"  and  a  new  mode  of 
attack  adopted.  Assuming  that  the  mode  of  attack  was 
satisfactory  and  that  the  preliminary  test  indicates  that 
the  desired  information  may  be  obtained  by  the  methods 
adopted,  the  one  making  the  preliminary  test  eliminates 
unsuitable  material,  modifies  some  of  the  items  in  the 
light  of  his  experience,  and  brings  out  the  test.  He 
usually  obtains  some  tentative  standards;  but  the  limita- 
tions of  his  time  and  money  usually  prevent  his  carrying 
this  phase  of  the  work  to  a  satisfactory  conclusion.  There 
is,  therefore,  no  test  which  does  not  require,  after  its  pub- 
lication, a  thorough  trial  in  order  to  set  up  norms  of 
performance. 

For  many  years  workers  in  the  field  of  educational 
measurements  have  been  devising  tests  until  now  we  have 
several  for  most  of  the  subjects  in  the  elementary  school. 
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Many  of  the  older  tests  have  passed  through  their  pre- 
liminary stages  and  may  be  said  to  be  satisfactorily 
standardized.  In  some  of  the  tests  little  attention  has 
been  given  to  questions  of  validity  and  the  reliability  of 
the  measures  which  the  tests  yield.  The  availability  of 
a  number  of  tests  for  each  of  the  common  school  sub- 
jects makes  urgent  a  scientific  determination  of  the 
validity  and  reliability  of  the  respective  tests  in  order 
that  one  may  have  information  on  which  to  base  a 
rational  selection. 

In  contrast  with  this  long  and  tedious  process,  the 
traditional  classroom  examination  consists  of  tasks  of 
varying  units  of  difficulty,  worked  at  for  varying  amounts 
of  time,  and  producing  results  of  varying  degrees  of 
quality.  The  examinations  do  not  measure  any  one  thing. 
They  measure  a  conglomerate  of  achievements  condi- 
tioned by  the  three  factors  of  quality  of  product,  diffi- 
culty of  task,  and  time  consumed.  Of  course,  standard- 
ized tests  likewise  measure  a  complex,  but  the  methods 
of  their  derivation,  the  controlled  conditions  under  which 
they  are  given,  and  the  norms  established  make  them 
far  more  reliable  than  the  ordinary  classroom  examina- 
tion. 

Standardized  tests  are  devised  according  to  scientific 
procedure.  The  formulation  of  the  questions  or  state- 
ments of  the  ordinary  examination  are  based  on  the  judg- 
ment of  only  one  teacher  and  it  often  happens  that  the 
teacher  does  not  give  them  very  careful  consideration. 
For  the  ordinary  examination  there  are  no  standards. 
With  the  standardized  test  we  have  a  statement  of  what 
scores  the  pupils  of  the  several  grades  should  make.  The 
great  problem  in  scientific  education  is  to  construct 
objective  and  universal  scales,  about  the  use  of  which 
there  can  be  no  misunderstanding  when  they  are  placed 
in  the  hands  of  competent  teachers.     One  of  the  main 
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improvements  due  to  the  use  of  a  definite  set  of  stand- 
ards is  the  elimination  of  the  errors  of  prejudice. 

It  is  to  remedy  these  shortcomings  of  our  marking 
systems  that  the  scientific  movement  in  education  has 
devised  tests  and  scales.  Physical  measurements  require 
the  thing  to  be  measured  to  possess  the  quality  of  con- 
sistency and  to  remain  constant  while  it  is  being 
measured.  This  is  a  fundamental  necessity  for  logical 
thinking  about  measuring,  counting,  or  enumerating. 
The  things  counted  must  all  be  of  this  same  category. 
The  thing  measured  must  be  constant  in  its  character  or 
composition  so  that  one  unit  of  it  will  be  equal  to  any 
other  unit  of  it. 

The  process  by  w^hich  the  essential  characteristic  of 
consistency  is  obtained  in  educational  measurements  is 
the  one  used  in  physical  measurements.  It  consists  of 
distinguishing  the  possible  controlling  varying  factors, 
devising  means  of  holding  all  of  them  constant  save  one, 
and  measuring  that  one.  litis  is  the  law  of  the  single 
variable. 

A  standard  test  generally  has  been  given  to  pupils 
of  many  schools,  which  makes  it  possible  to  compare  the 
scores  made  in  one  school  with  those  made  in  another. 
Comparisons,  however,  must  be  made  with  care  since 
many  factors  other  than  the  teaching  factor  enter  in. 

Illustrating-  the  Law  of  the  Single  Variable. — The 
operation  of  the  law  of  the  single  variable  may  be  illus- 
trated in  the  methods  used  by  Burgess  in  designing  a  read- 
ing test.'*  The  aim  of  this  test  was  to  find  out  how  much 
printed  material,  of  a  given  level  of  difficulty,  a  child 
could  read  "well  enough  for  aU  practical  purposes."  The 
attempt  was  to  devise  a  test  in  which  the  child  could 
readily  succeed  if  he  read  well  enough  to  grasp  the  impor- 

*  May  Ayres  Burgess,  The  Measurement  of  Silent  Reading  (Depart- 
ment of  Education,  Russell  Sage  Foundation,  1921). 
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tant  thought  in  each  section,  and  in  which  he  could  not 
succeed  at  aU  unless  he  did  comprehend  each  important 
thought.  This  was  the  interpretation  which  was  put  upon 
the  phrase  reading  ''good  enough  for  all  practical  pur- 
poses. ' ' 

The  first  step  was  to  determine  what  the  conditioning 
factors  were  in  reading  and  choose  one  of  these  factors 
for  measurement.  The  following  is  a  list  of  25  factors 
which  the  author  of  the  test  dealt  with  in  measuring  silent 
reading,  with  the  disposition  she  made  of  them  :^ 

To  he  measured: 

Amount  child  can  do  in  given  time 
To  he  eliminated: 

Complex  thought 

Abstract  thought 

Technical  thought  and  language 

Catches 

Puzzles 

Accidental  leads 

Demands  for  spatial  imagination 

Irrelevant  dramatic  appeal 

Ability  to  reproduce 

Ability  to  remember 

Ability  to  reason,  or  infer 

Involved  style 
To  he  held  constant  throtigh  the  test: 

Memory  span  requirements 

Attention  span,  multiple 

Difficulty  of  action  demanded 

Time  required  for  complying  with  instructions 

Vocabulary  difficulty 

Sentence  structure 

Word  arrangement 

Amount  of  material  to  be  read 

Uniformity  of  print 

Uniformity  of  space  relations  between  pictures  and  print 

Case  of  finding  place  on  paper 

Interest  and  corresponding  effort  on  part  of  child 

5  lUd.,  pp.  37-38.  ~~~^ 
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Of  course  she  could  not  entirely  eliminate  all  the  factors 
listed  under  the  heading  ''To  be  eliminated,"  neither 
could  she  hold  entirely  constant  the  last  group  of  factors 
in  the  above  list.  Of  this  sort  are  the  factors:  Amount 
of  material  to  be  read ;  difficulty  of  action  demanded ;  and 
others  which  could  not  possibly  be  kept  entirely  constant. 
The  purity  of  the  final  score,  which  was  ''the  amount  a 
child  could  do  in  a  given  time,"  was  determined  by  the 
degree  to  which  she  was  able  to  control  some  of  these 
factors  and  eliminate  others.  Unless  this  could  be  done 
to  a  marked  extent,  her  final  scores  would  be  a  conglom- 
erate, or  a  measure  of  many  things  instead  of  one,  which 
measure  would,  in  fact,  be  a  measure  of  a  complex. 

As  was  said  above,  the  essential  characteristic  of  con- 
sistency may  be  obtained  in  educational  measurements  only 
by  distinguishing  the  possible  controlling  varying  factors, 
devising  means  for  holding  all  of  them  constant  save  one, 
and  measuring  that  one.  This  is  the  law  of  the  single 
variable. 

The  reason  why  marks  received  by  pupils  in  arithmetic 
and  other  subjects  usually  do  not  actually  record  their 
abilities  and  the  value  of  their  achievements  is  that  they 
do  not  measure  different  amounts  of  the  same  thing. 

The  law  of  the  single  variable  is  a  principle  of  measure- 
ment taught  in  the  earliest  school  years  and  increasingly 
recognized  in  the  comparative  judgments  of  every-day  life. 

The  tests  designed  to  measure  reading  ability  usually 
measure  in  addition  other  and  different  abilities. 

Time  of  Day  When  a  Test  Should  Be  Given.— There  is 
undoubtedly  a  best  time  in  the  day  to  give  a  school  achieve- 
ment test.  If  a  high  score  is  the  thing  desired,  other  things 
being  equal,  the  morning,  when  the  children  are  fresh,  is 
better  than  the  afternoon.  Fewer  errors  are  liable  to  be 
made  in  the  early  part  of  the  school  day. 

There  is  another  problem,  however,  to  be  taken  into 
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consideration.  It  is  the  question  as  to  how  near  we  want 
to  make  the  school  conditions  approach  those  in  life  out- 
side the  school.  In  our  regular  work-a-day  life  we  cannot 
choose  just  when  we  shall  do  the  difficult  task.  For  in- 
stance, a  railway  passenger  agent  may  have  been  working 
hard  all  day,  and  five  minutes  before  six  o'clock  in  the 
afternoon  twenty-five  people  may  rush  up  to  the  ticket 
window  and  demand  tickets  to  some  distant  points.  It 
may  be  that  the  train  is  already  on  the  track  ready  to 
depart  in  three  minutes.  Those  desiring  tickets  are  ex- 
cited because  they  fear  the  train  will  depart  before  they 
secure  their  transportation.  Under  conditions  like  these  the 
ticket  agent  must  answer  their  many  questions,  give  them 
the  proper  tickets,  answer  the  telephone,  make  change 
correctly,  and  attend  to  his  other  duties  as  ticket  agent 
and  telegraph  operator.  The  fact  that  he  has  worked  hard 
all  day  and  is  pretty  well  ''fagged  out"  before  the  rush 
comes  will  not  excuse  him  for  any  errors  he  may  make  as 
to  the  information  he  gives,  the  change  he  makes,  or  the 
tickets  he  sells.  He  is  held  strictly  accountable  for  every 
act.  It  would  have  been  to  his  advantage  to  have  had 
the  rush  in  the  early  part  of  the  day  when  he  was  fresh 
and  better  able  to  cope  with  the  situation.  But  he  was 
compelled  to  accept  it  just  as  it  happened  to  come.  One 
criticism  of  our  schools  is  that  they  are  so  different  from 
extra-school  life  that  children  prepared  in  them  are  not 
able  to  cope  with  life's  situations. 

It  is  true  that  the  children  may  be  nervous  and  more 
apt  to  "go  to  pieces"  in  the  latter  part  of  the  school  day 
than  in  the  forenoon,  but  that  may  be  one  of  the  reasons 
for  giving  the  test  to  see  how  well  they  can  perform  under 
conditions  which  closely  approximate  those  of  extra-school 
life. 

The  child  who  hasn't  been  trained  to  withstand  condi- 
tions of  this  kind  would  probably  "go  to  pieces"  when 
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he  rea^^hes  adult  life  and  attempts  to  fill  a  responsible 
position. 

It  is  not  maintained  that  one  should  pick  out  that  part 
of  the  school  day  when  the  children  are  pretty  well  fagged 
out  and  give  the  test  at  that  time,  but,  if  that  part  of  the 
day  happens  to  be  the  most  convenient  time  to  give  the 
test,  it  may  not  be  wise  to  postpone  it  simply  because  the 
children  are  not  as  fresh  as  they  are  at  some  other  time. 

Number  of  Times  a  Test  Should  Be  Given. — A  test  is 
given  to  a  class  for  the  first  time  to  find  out  the  condition 
of  the  class  in  reference  to  any  subject.  It  indicates  that 
the  class  has  reached  a  certain  state  of  proficiency  in 
reference  to  that  subject.  It  gives  the  teacher  a  point  of 
departure.  It  shows  the  strong  and  weak  points  of  the 
class.  It  indicates  where  work  should  be  done.  It  too 
often  happens,  however,  that  teachers  and  superintendents 
give  a  test  to  find  out  the  condition  of  a  class  in  reference 
to  a  certain  subject  and  then  never  act  on  the  information 
gained.  Such  procedure  is  like  a  physician  pronouncing 
a  case  typhoid  fever  and  then  making  no  effort  to  treat 
the  case.  The  test  should  indicate  what  needs  to  be  done. 
Teachers  should  act  on  the  information  given  and  test 
again  to  see  what  improvement  has  been  made.  There  are 
some  tests  which  cannot  be  repeated  at  short  intervals  be- 
cause the  children  would  remember  enough  from  the  first 
test  to  vitiate  the  results  if  it  were  given  a  second  time 
with  a  short  time  interval.  It  is  usually  easy  to  avoid  this 
difficulty,  however,  because  two  or  more  tests  of  equal 
difficulty  are  usually  made  in  reference  to  any  one  subject. 
The  practice  effect  is  thus  avoided.  There  are  certain  tests, 
however,  such  as  the  Courtis  Standard  Research  Tests  in 
Arithmetic,  Series  B,  which  may  be  repeated  at  short  inter- 
vals without  the  practice  effect  vitiating  the  results.  It 
would  be  a  rare  case  for  a  pupil  to  remember  the  answers 
to  any  of  these  problems  more  than  a  day  or  two.    Owing 
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to  the  nature  of  the  Monroe  Silent  Reading  Tests,  however, 
the  children  might  remember  the  correct  answers  to  them 
for  several  months. 

We  give  tests  for  the  same  reason  that  a  merchant  in- 
voices his  goods.  He  cannot  determine  his  profits  until 
he  makes  at  least  two  invoices.  Neither  can  a  teacher 
determine  the  progress  of  her  pupils  until  she  makes  at 
least  two  tests,  the  first  one  to  determine  a  point  of  de- 
parture, and  the  second,  or  succeeding  ones,  to  determine 
the  progress  made.  Without  the  tests  the  teacher  is  simply- 
guessing  at  the  progress  made,  but  she  never  really  knows 
to  what  degree  her  efforts  have  been  rewarded. 

How  Standard  Tests  Are  Helpful  in  Improving  Instruc- 
tion.— Standard  tests  render  assistance  to  the  teacher  in 
three  ways.  (1)  Since  the  test  has  been  constructed  after 
a  careful  analysis  and  survey  of  the  field,  it  gives  a  teacher 
a  list  of  things  that  the  pupil  should  be  able  to  do.  This 
is  well  illustrated  by  the  Ayres  Spelling  Scale,  which  con- 
tains the  1,000  most  frequently  used  words,  and  Monroe's 
diagnostic  tests  in  arithmetic,  which  give  a  list  of  the 
significant  types  of  examples  in  certain  fields.  (2)  Since 
the  tests  are  standardized,  the  teacher  may  know  just  what 
scores  pupils  ought  to  make.  She  is,  therefore,  given  a 
definite  objective  aim  to  strive  for  in  her  teaching,  an  aim 
which  the  pupil  can  understand.  The  advantages  of  a 
definite  standard  are  obvious.  (3)  Tests  furnish  the 
teacher  with  information  concerning  the  abilities  of  her 
pupils.  They  point  out  phases  of  strength  and  weakness. 
With  this  information  at  hand  she  can  plan  instruction 
which  will  be  more  efficient  since  it  will  meet  specific  needs. 

A  test  cannot  be  used  properly  unless  it  is  accompanied 
by  a  complete  set  of  instructions  for  giving  it,  for  scoring 
the  test  papers,  and  tabulatmg  the  scores.  For  tabulating 
the  scores  a  special  class  record  sheet  is  usually  provided. 

The  value  of  standard  tests  is  realized  through  the  use 
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that  is  made  of  the  scores.  They  must  be  interpreted  in 
terms  of  the  needs  of  tine  "pupils  for  instruction.  This 
means  doing  more  than  determining  whether  the  scores 
are  above  standard,  at  standard,  or  below  standard.  It 
means  doing  much  the  same  sort  of  thing  the  physician 
does  when,  after  he  has  ascertained  the  patient's  pulse 
rate,  temperatui'e,  and  other  symptoms,  he  prescribes  treat- 
ment. 

The  use  of  standard  tests  should  result  in  the  improve- 
ment of  instruction,  and  this  is  accomplished  by  means 
of  the  information  which  the  tests  yield  concerning  the 
abilities  of  the  pupils.  Growing  children  do  not  develop 
skill  by  instruction  or  personal  exertion  on  the  teacher's 
part.  They  develop  best  when  they  are  inspired  to  vol- 
untary effort.  Mere  repetition  does  not  develop  skill;  it 
is  repetition  accompanied  by  a  conscious  desire  to  improve 
that  brings  results.  Credit  should  be  given  for  growth, 
not  for  the  number  of  lessons  completed. 

It  is  easy  to  get  a  child  to  try  once,  but  he  will  not  keep 
on  trying  unless  his  efforts  bring  success.  Standard  tests 
automatically  set  for  each  child  a  task  within  his  reach. 
He  knows  when  he  gets  it  done. 

Speed  tests  are  timed  in  order  to  ''speed  up"  the  chil- 
dren's work.  Teaching  a  child  to  concentrate  and  to  work 
efficiently  does  not  mean  prodding  him  to  hurry.  Speed 
is  to  he  acquired  hy  study  and  practice,  not  by  special 
effort.  The  amount  of  work  done  in  a  given  time  is  merely 
a  symptom  which  may  indicate  to  the  teacher  whether  or 
not  the  child  has  studied  the  lesson  sufficiently. 

It  is  only  by  measuring  the  initial  ability  of  children 
in  the  fall  and  the  final  ability  in  the  spring  that  a  teacher 
may  know  the  progress  made. 

What  Kind  of  School  Achievement  Tests  Is  Most 
Important? — Much  discussion  has  been  carried  on  as  to 
the  relative  importance  of  various  types  of  tests.     It  is 
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sometimes  argued  that  the  Courtis  Standard  Research 
Tests  in  Arithmetic,  Series  B,  for  instance,  are  not  as 
valuable  as  other  tests  in  arithmetic  dealing  with  the  more 
complex  processes.  To  debate  whether  the  fundamentals 
in  arithmetic  are  more,  or  less,  important  than  the  more 
complex  processes  is  analogous  to  debating  which  is  the 
more  important,  the  foundation  of  a  house  or  the  super- 
structure upon  it.  One  cannot  exist  without  the  other. 
The  four  fundamentals  in  arithmetic  constitute  the  foun- 
dation upon  which  other  more  complex  processes  are  built 
and  are  absolutely  necessary  to  the  computation  of  these 
complex  problems.  The  point  is  that  we  need  tests  in  both 
fields.  Tests  in  the  fundamentals  will  not  give  informa^ 
tion  relative  to  the  higher  and  more  complex  processes, 
and  tests  in  the  complex  processes  give  information  only 
indirectly  as  to  the  fundamental  processes. 


CHAPTER  VI 

THE  CLASSIFICATION  OF  SCHOOL  ACHIEVEMENT  TESTS  AND  THE 
FUNDAMENTAL  PRINCIPLES  FOR  DESIGNING  THEM 

School  achievement  tests  show  many  lines  of  cleavage. 
Just  as  it  is  possible  to  classify  school  subjects  as  sciences, 
arts,  and  volitions,  or  as  formal  subjects,  content  subjects, 
and  expression  subjects,  so  it  is  possible  to  classify  tests 
and  measurements  into  many  groups  depending  upon  the 
particular  features  one  has  in  mind  when  the  classification 
is  made.  In  order  to  bring  out  clearly  some  of  the  more 
basic  principles  in  designing  tests  we  shall  note  some  of  the 
classifications. 

Diagnostic  vs.  General  Tests. — One  of  the  first  problems 
that  confront  those  making  a  test  is  whether  or  not  the 
test  shall  be  diagnostic.  By  a  diagnostic  test  we  mean  one 
which  furnishes  a  separate  measure  for  each  specific  ability 
in  the  field  of  the  test,  or  at  least,  of  the  important  abilities. 
Scientific  investigation  has  shown  that  in  a  subject  such 
as  arithmetic  there  is  not  simply  one  ability,  but  a  large 
number  of  abilities.  A  pupil  may  be  good  in  addition, 
poor  in  subtraction,  and  weak  in  multiplication.  It  is  a 
rare  case  when  a  pupil  is  equally  proficient  in  the  several 
abilities.  Each  school  subject,  therefore,  includes  a  num- 
ber of  abilities  which  are  specific  or  distinct  from  each 
other  to  a  considerable  degree.  The  degree  to  which  a 
test  is  diagnostic  depends  chiefly  upon  the  amount  of  sub- 
ject matter  in  that  particular  field.  The  Charters  Language 
Tests  for  Pronouns,  for  instance,  are  diagnostic  in  that  the 

180 


SCHOOL  ACHIEVEMENT  TESTS  181 

number  of  pronouns  is  limited  and  it  is  possible  to  give 
questions  and  examples  using  each  of  the  pronouns  in 
almost  every  conceivable  way  in  which  they  are  used  in 
oral  and  written  composition.  Such  a  test  may  give  a 
complete  diagnosis  of  the  child's  language  abilities  in  the 
use  of  pronouns  and  any  shortcomings  may  be  quickly 
located  and  corrected. 

A  general  test  is,  in  a  sense,  opposite  to  a  diagnostic  test. 
It  yields  average  or  composite  measures  of  a  child's  ability 
in  the  subject  matter  in  question.  It  may  be  illustrated 
by  the  miscellaneous  language  test  devised  by  Charters  or 
by  the  Courtis  supervision  tests  in  arithmetic.  The  latter 
is  a  single  test  covering  the  entire  field  of  all  the  operations 
with  integers.  The  test  gives  the  pupil's  general  or  average 
standing. 

Grade  norms  cannot  be  used  to  make  individual  diag- 
nosis. But  we  can  see  by  them  which  children  are  below 
and  which  are  above  the  level  that  they  should  attain  in 
their  grade.  Grade  norms  will  not  give  administrators 
what  they  most  need  to  know,  namely,  which  children  have 
progressed  at  the  rate  normal  for  their  age  and  native 
capacity,  and  which  are  performing  at  their  maximum. 

Degree  to  Which  Tests  Are  Diagnostic. — The  diagnostic 
characteristics  of  tests  range  all  the  way  from  zero,  where 
the  test  is  so  general  that  specific  abilities  are  completely 
lost  sight  of,  to  tests  which  may  be  said  to  be  100  per  cent 
diagnostic.  We  may  illustrate  this  fact  by  examining  some 
of  the  handwriting  scales.  Both  the  Thorndike  and  the 
Ayres  handwriting  scales  may  be  considered  almost  zero 
as  far  as  measuring  specific  abilities  is  concerned.  When 
these  scales  are  used  in  measuring  handwriting  the  sample 
to  be  measured  is  moved  along  the  scale  until  the  quality 
is  reached  that  most  resembles  the  sample  to  be  measured. 
The  specific  characteristics  of  the  sample  are  not  taken  into 
account.    When  the  score  is  made  up,  it  is  a  record  of  the 
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general  appearance  or  readibility  rather  than  a  measure  of 
one  or  more  specific  abilities. 

The  Freeman  Scale  consists  of  samples  arranged  accord- 
ing to  merit  under  each  of  five  headings,  namely:  (1) 
uniformity  of  slant;  (2)  uniformity  of  alignment;  (3) 
quality  of  the  line;  (4)  letter  formation;  and  (5)  spacing. 
Three  degrees  of  each  of  these  characteristics  are  included 
in  the  scale.  When  the  scores  are  made  up  they  show  the 
relative  degree  of  perfection  reached  in  each  of  these  abili- 
ties. This  test  may  be  said  to  be  diagnostic  because  it 
measures  specific  abilities.  A  supervisor  may  read  the 
scores  made  by  a  class  and  tell  the  teacher,  somewhat 
definitely,  where  the  weak  points  are.  Scores  of  this  kind 
indicate  where  drill  is  most  needed. 

The  Gray  Score  Card  for  the  Measurement  of  Hand- 
writing^ possesses  the  diagnostic  quality  to  a  still  higher 
degree  than  the  Freeman  Scale.  In  designing  this  scale 
Dr.  Gray  gathered  all  the  information  he  could  get  by 
reading  and  by  corresponding  with  teachers  and  super- 
visors of  writing  and  compiled  the  following  list  of  ele- 
ments of  writing: 


Beauty 

Spacing  of  letters 

Shading 

Spacing  of  words 

Legibility 

Spacing  of  lines 

Speed 

Alignment 

rormation  of  letters 

Movement 

Execution 

Eorm 

Position 

Size 

Slant 

Uniformity 

Neatness 

Accuracy 

Endurance 

Smoothness 

Uniformitj'  of  turns 

Uniformity  of  angles 

1              Uniformity  of  retraces 

Uniformity  of  loops 

'              XJnifonnity  of  slant 

Uniformit3'     of  size 

Uniformity  of  spacing 

Uniformity  of  beginnings 

Uniformity  of  endings 

Uniformity  of  height 

1  Bulletin  of  the  University  of  Texas,  No.  37,  July,  1915. 
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Ease  Touch 

Individuality  Effort 

Shape  of  letters  Proportion 

Lightness  Strength  of  line 

Parts  omitted  Parts  added 
Conformity  to  an  ideal 

This  list  of  specific  abilities,  together  with  others,  might 
be  used  as  a  basis  for  measuring  handwriting. 

Some  of  these  abilities  emphasize  the  manner  in  which 
the  writing  is  to  be  done.  Others  clearly  emphasize  writ- 
ing as  a  product.  Now  it  is  evident  that  attempts  to 
measure  and  evaluate  each  of  these  abilities  would  so  com- 
plicate a  writing  scale  that  it  would  spoil  it  for  practical 
use.  Many  of  the  things  must  be  eliminated.  The  first 
things  he  eliminated  were  specific  abilities  in  writing  as  a 
process.  He  then  turned  his  attention  to  writing  as  a 
product.  While  the  number  of  elements  cited  above  is  too 
great  to  be  given  a  place  in  a  writing  scale,  on  the  other 
hand,  the  list  must  be  sufficiently  large  to  cover  the  most 
essential,  if  not  all,  the  different  phases  of  handwriting. 
When  a  test  is  to  be  diagnostic,  the  terms  must  be  mutually 
exclusive  or  as  nearly  so  as  possible.  Gray  eliminated  the 
term  "beauty,"  for  instance,  because  it  is  a  function  of 
many  other  elements  and  must  not  be  selected,  because  it 
includes  too  much.  Neatness  and  legibility  were  eliminated 
for  the  same  reason.  On  the  other  hand,  the  spacing  of 
letters,  the  slant,  and  the  alignment  are  so  exclusive  that 
they  refer  to  specific  things  and  hence  may  be  considered 
of  diagnostic  value. 

It  can  readily  be  seen,  therefore,  that  the  problem  of 
selecting  the  proper  elements  to  go  into  this  score  card 
was  by  no  means  easy.  Three  or  four  plans  suggested 
themselves.  First,  the  correlation  between  general  merit 
in  writing  with  each  of  the  parts  in  the  list  given  above 
might  be  worked  out,  and  also  the  correlation  between  each 
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two  of  the  abilities.  This  method  was  abandoned  because 
it  would  require  a  choice  among  those  parts  for  which  the 
correlation  is  high  and  then  among  those  where  the  correla- 
tion is  not  so  high.  This  would  resolve  itself  into  an 
arbitrary  choice  which  was  the  very  thing  he  was  attempt- 
ing to  avoid. 

A  second  plan  which  suggested  itself  was  to  submit  the 
list  to  a  representative  body  of  teachers  and  allow  them 
to  choose  what  phases  should  be  measured.  This  plan  was 
objected  to  on  the  ground  that  it  was  not  known  in  advance 
how  many  elements  must  be  chosen  and  it  was  doubtful  if 
mutually  exclusive  points  could  be  obtained  in  this  way. 

The  plan  finally  adopted  for  determining  what  char- 
acteristics should  be  used  in  making  a  score  card  for  writ- 
ing was  one  which  grew  out  of  experience  with  the  card 
itself  together  with  the  arbitrary  choice  of  the  author.  A 
number  of  students  were  asked  to  grade  samples  of  hand- 
writing each  week  with  the  crude  score  card  with  the  idea 
of  determining  what  points  should  be  used  in  order  to 
give  a  complete  account  of  writing.  The  students  were 
cautioned  to  watch  for  points  in  the  writing  which  were 
not  covered  by  the  card  that  they  were  using.  In  the  light 
of  the  experience  thus  gained  it  was  found  that  writing 
might  be  rather  completely  described  under  nine  headings. 
These  are:  (1)  spacing  of  letters;  (2)  spacing  of  lines; 
(3)  spacing  of  words;  (4)  slant;  (5)  size;  (6)  alignment; 
(7)  neatness;  (8)  heaviness;  and  (9)  the  formation  of 
letters.  Neatness,  which  is,  in  part,  a  function  of  other 
elements,  was  finally  included  to  take  account  of  such 
points  as  blotches,  carelessness,  and  retracing. 

Many  of  these  headings  were  subdivided  in  order  to 
increase  the  diagnostic  value  of  the  scale.  The  formation 
of  the  letters,  for  instance,  was  subdivided  into:  (a)  parts 
omitted;  (&)  letters  not  closed;  (c)  parts  added;  (d) 
smoothness;  (e)  general  form. 
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A  diagnostic  test  of  a  different  nature  has  been  designed 
by  Dr.  Judd  and  was  first  used  in  the  Cleveland  survey. 
This  is  a  series  of  15  arithmetic  tests,  each  composed  of 
different  types  of  examples.  This  series  of  tests  has  been 
called  spiral  because  the  abilities  of  the  pupils  are 
measured  on  successive  levels  of  difficulty.  The  series  is 
diagnostic  rather  than  general.  It  yields  measures  of  rate 
and  accuracy  with  which  the  pupils  do  certain  types  of 
examples.  The  advantage  claimed  for  the  spiral  method 
is  that  it  offers  a  means  for  distinguishing  errors  due  to 
accident  from  errors  due  to  ignorance  or  incapacity. 
Errors  of  the  latter  sort  will  recur  at  regular  intervals 
and  may  be  readily  recognized.  Another  argument  for  the 
spiral  tests  is  the  fact  that  time  will  not  permit  as  many 
problems  in  a  particular  field  as  is  desired  so  that  each 
distinct  mental  operation  may  be  thoroughly  tested; 
hence,  the  device  of  a  spiral  arrangement,  by  which  several 
related  operations  are  combined  in  the  same  test  in  cyclic 
order. 

The   Courtis    Standard  Research   Tests,    Series   B,   are 

general  tests  in  the  field  of  each  operation  but  diagnostic 

to  the  extent  that  they  give  information  for  each  of  the 

four  fundamental  operations;  addition,  subtraction,  mul- 

I  tiplication,  and  division. 

Formal  Tests  and  Reasoning-  Tests. — Tests  may  be 
classified,  from  the  standpoint  of  the  kinds  of  mental  proc- 
esses involved,  as  formal  tests  and  reasoning  tests.  For 
measuring  skill  or  automatic  processes  we  use  the  former 
type.  These  tests  measure  immediate  specific  and  prepara- 
tory outcomes  of  school  training.  The  Courtis  Standard 
j  Research  Tests,  Series  B,  are  illustrations  of  formal  tests. 
On  the  other  hand,  we  may  design  tests  to  measure  general- 
ized outcomes  of  school  training.  The  Stone  Reasoning 
j  Test  in  arithmetic  may  be  classed  as  an  example  of  this 
kind. 
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Rate  Tests  and  Development  Tests. — Another  line  of 
cleavage  allows  us  to  divide  the  tests  into  rate  tests  and 
development  tests.  Courtis  and  Rugg^  especially  use  this 
terminology.  Rugg  makes  three  distinctions  between  these 
two  kinds  of  tests.  (1)  Rate  tests  are  distinguished  from 
the  development  tests  in  that  the  latter  make  use  of  the 
"time"  factor  only  incidentally  or  not  at  all.  That  is,  the 
students  are  given  practically  all  the  time  they  need  to  do 
the  test  and  their  scores  are  reckoned  only  slightly,  if  at 
all,  in  terms  of  time.  (2)  The  rate  test  differs  from  the 
development  test  in  that  the  latter  is  made  up  of  all  kinds 
of  subject-matter  ranging  from  the  purely  formal  and 
automatic  material  on  the  one  hand,  to  comjjlicated  reason- 
ing problems  on  the  other.  By  rate  tests  we  have  in  mind 
those  types  of  tests  in  which  ''the  ability  involved  in  the 
working  of  any  one  problem  is  roughly  the  same  as  that 
involved  in  the  working  of  any  other. "  ^  Of  course,  the 
ability  involved  in  the  solution  of  the  various  rate  problems 
is  not  exactly  the  same  as  that  involved  in  the  solution  of 
others  in  the  test;  but  the  same  general  type  of  mental 
processes  is  involved.  (3)  A  third  distinction  is  based  on 
the  organization  or  arrangement  of  the  parts  of  the  tests. 
In  the  rate  tests  one  of  two  plans  is  followed.  Either 
problems  involving  the  same  mental  processes  are  grouped 
together  in  one  test,  or  there  is  combined  in  one  test  a 
group  of  problems  involving  very  closely  related  abilities. 
"The  difficulty  of  each  example  in  the  test  has  been  deter- 
mined and  the  examples  have  been  arranged  in  terms  either 
of  a  rotating  or  a  cycle  principle,  or  all  problems  of  the 
same  difficulty  have  been  put  together. "  *  In  the  develop- 
ment test  the  difficulty  of  examples  has  been  determined 
as  in  the  case  of  the   rate  test,   and  the  examples  are 

2  Scientific  Method  in  the  Rcconf:tniction  of  Ninth-Grade  Mathematics, 
Supplementary  Educational  Monographs,  Vol  II,  No.  1,  1918. 

3  Ibid.,  p.  05.  *  Ibid.,  p.  66. 
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arranged  in  order  of  increasing  difficulty.    Different  kinds 
of  abilities  are  measured  in  the  development  test. 

Quality,  Difficulty,  and  Time  or  Amount  Tests. — In 
general  it  may  be  said  that  the  standard  educational  meas- 
urements fall  into  three  clearly  defined  groups  according 
to  which  of  the  three  variables  we  seek  to  measure.  They 
are  tests  and  scales  for  quality  of  product,  for  difficulty 
reached,  and  for  amount  done. 

1.  Quality  tests  attempt  to  answer  the  question,  How 
well  can  a  pupil  perform  a  certain  task?  Tests  in  com- 
position, drawing,  and  handwriting  are  primarily  quality 
tests.  They  are  not  scored  on  a  basis  of  right  and  ivrong ; 
but  there  are  all  degrees  of  merit  from  the  lowest  to  what 
might  be  called  perfection.  In  writing,  for  instance,  there 
is  no  rigM  or  wrong.  The  merit  simply  ranges  from  less 
good  to  more  good  through  a  continuous  series  of  degrees 
of  quality. 

Beading  is  a  classroom  activity  which  does  not  readily 
lend  itself  to  measurement  by  means  of  scales  of  quality. 
One  reason  for  tliis  is  that  it  does  not  result  in  a  tangible 
objective  product  which  can  be  scrutinized  and  measured. 
Another  reason  is  that  quality  in  reading  is  an  elusive 
thing  which  varies  not  only  with  different  people  but  with 
the  same  person  from  moment  to  moment  as  he  reads. 

2.  Difficulty  tests  attempt  to  answer  the  question.  How 
hard  a  task  can  a  child  do?  They  are  measured  in  terms 
of  right  and  wrong.  The  question  may  be,  How  hard  a 
word  can  a  child  spell!  How  difficult  a  problem  can  a 
child  solve  in  arithmetic?  In  scales  for  difficulty,  the 
variable  which  is  measured  is  the  difficulty  of  the  work 
which  the  child  can  do.  The  difficulty  of  successive  tasks 
is  carefully  increased  and  controlled  and  the  child  is  al- 
lowed to  overcome  it  if  he  can.  The  quality  of  his  work 
must  be  high  enough  to  be  considered  "right." 

The  commonest  of  all  classroom  questions  is  probably  that 
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which  relates  to  the  difficulty  of  the  work  which  the  child 
can  do  correctly.  The  usual  method  which  has  been 
adopted  to  answer  these  questions  is  to  prepare  a  series  of 
tasks  carefully  graded  in  difficulty.  Those  near  the  be- 
ginning of  the  series  are  so  easy  that  almost  any  child 
in  the  group  can  do  them.  As  the  series  progresses  the 
questions  become  increasingly  more  difficult.  In  scales  for 
difficulty  the  amount  of  time  allowed  for  the  test  should 
have  no  effect  on  the  score.  The  independence  of  time 
must  hold,  not  only  for  most  difficult  problems  near  the 
end  of  the  series,  but  also  for  easy  problems.  There  are 
some  types  of  difficulty  problems  that  a  child  can  answer 
correctly  at  once  or  not  at  all.  Spelling  illustrates  the 
point.  If  a  child  cannot  spell  a  word  immediately,  no 
amount  of  time  will  aid  him  in  his  efforts.  Spelling  and 
arithmetic,  and,  less  definitely,  geography,  history,  and 
grammar  constitute  appropriate  subject-matter  for  diffi- 
culty tests.  Some  of  these  are  informational  subjects  in 
which,  by  the  common  verdict  of  society,  the  information 
is  only  valuable  if  it  is  accurate  and  correct.  A  type  of 
handwriting  that  is  somewhat  inferior  to  another  sample 
may  be  of  almost  equal  practical  value.  The  same  cannot 
be  said  of  spelling,  arithmetic,  history,  geography,  or 
grammar.  Classroom  products  in  these  subjects  are  judged 
as  right  or  wrong. 

The  student  who  devises  a  scale  for  difficulty,  then,  must 
either  present  evidence  to  show  that  scores  for  the  par- 
ticular ability  he  seeks  to  measure  are  not  affected  by  dif- 
ferences in  time,  or  he  must  devise  methods  by  which  the 
amount  of  time  the  pupil  is  allowed  to  spend  on  each  task 
within  the  series  may  be  controlled  and  recorded.  The 
development  tests  discussed  above  have  many  character- 
istics of  the  difficulty  tests  here  under  consideration,  but 
are  not  like  them  in  every  particular. 

3.  Time  tests,  or  tests  for  the  amount  done,  are,  in  reality, 
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the  rate  tests  discussed  above.  Some  additional  character- 
istics, however,  will  be  given  under  this  heading.  It  is 
to  be  noted  that  time  and  amount  are  complementary  term^, 
each  of  which  depends  upon  the  other  for  its  meaning. 
Time  implies  amount,  and  amount  implies  time.  The  ques- 
tion, How  much  can  be  done  ?  demands  a  statement  of  the 
time  allowed  for  doing  it,  and  the  question.  How  long  will 
it  take?  depends  upon  how  much  there  is  to  do.  This 
variable  may  be  handled  in  two  ways:  (1)  The  problem 
may  be  to  determine  how  much  can  be  done  in  a  given 
time  as  in  the  Courtis  Arithmetic  Tests,  Series  B.  In  this 
test  the  time  is  eight  minutes  for  addition,  for  instance, 
and  the  information  sought  is,  How  many  problems  can 
be  solved  in  that  time?  (2)  A  definite  amount  of  work 
may  be  given  and  the  problem  is  to  determine  how  long 
it  takes  to  do  it.  The  latter  method  would  not  work  weU 
where  the  test  is  given  to  a  large  group  of  children  at  the 
same  time,  since  their  finishing  times  would  be  different 
and  therefore  hard  to  record.  Quality,  difficulty,  and  time 
or  amount  tests  may  be  illustrated  by  the  athletic  contests 
we  carry  on  in  our  schools.  Marksmanship  is  a  measure 
of  quality  answering  the  question,  How  well  can  one  shoot  ? 
The  other  variables  of  difficulty  and  time  are  kept  constant. 
The  high  jump  is  a  measure  of  difficulty;  quality,  which 
is  good  enough  to  clear  the  har,  and  time  are  kept  constant. 
The  100-yard  dash  is  a  time  test. 

The  three  questions,  how  well,  how  hard,  and  how  fast, 
represent  the  teacher's  attempts  to  measure  the  three 
fundamental  factors  of  quality,  difficulty,  and  time  or 
amount.  The  educational  tests  and  scales  that  have  been 
devised  during  the  past  ten  years  are  attempts  to  help  her 
answer  these  questions  and  each  of  them  seeks  to  measure 
some  one  of  those  same  three  fundamental  factors. 

Measurements  by  Opinion,  Comparison,  and  Standard- 
ized Tests. — The  methods  for  the  measurement  of  school 
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achievements  may  again  be  classified  according  to  other 
principles  as  follows:  One  method,  and  the  one  generally 
in  vogue,  is  that  of  personal  opinion.  Measurements  by 
this  method  are  valuable  just  to  the  extent  to  which  the 
persons  passing  judgment  are  qualified  to  give  expert 
opinion.  As  was  noted  in  the  introductory  chapter,  per- 
sonal opinion,  even  though  expert,  is  worthless  in  the  face 
of  facts  unless  the  opinion  happens  to  agree  with  the 
facts. 

A  second  method  of  measuring  school  achievements  is 
by  comparison.  We  may  say  that  one  pupil,  or  class, 
or  school,  ranks  fifteenth  as  compared  with  25  other 
pupils,  or  classes,  or  schools.  This  method  has  merit, 
but  the  chief  criticism  of  it  is  that  it  does  not  measure 
in  reference  to  a  standard.  Even  though  a  class,  or 
school,  ranked  first  among  25,  it  might  still  be  a  poor 
class  or  school.  The  ranking  or  comparison  method  is 
used  to  a  greater  extent  than  most  teachers  are  aware  of, 
however,  when  monthly  report  cards  are  made  up.  They 
usually  reason  that  A  is  a  better  student  than  B,  therefore 
B  is  given  a  grade  of  80  per  cent  while  A  is  given  85 
per  cent. 

Measurement  by  comparison  is  based  on  the  funda- 
mental idea  that  the  common  practice  is  the  result  of  the 
judgment  of  many  men  who  have  attempted  to  solve  the 
same,  or  very  similar  problems.  The  order  of  merit 
method  is  used  where  units  are  not  definite. 

A  third  method  of  measurement  and  the  one  about 
which  we  are  primarily  concerned  here  is  measurements 
by  scientifically  determined  standards  and  units.  This 
is  the  greatest  contribution  which  has  been  made  to  educa- 
tion in  the  last  twenty  years. 

Classification  by  Educational  Tests. — Thus  far  we  have 
discussed  the  general  classification  of  tests.  We  shall  now 
note  still  a  different  group  of  educational  measures  and 
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how  they  may  be  used  in  the  reclassification  of  pupils 
in  the  schools,  and  also  how  they  may  be  used  to  measure 
educational  processes  and  products.  It  is  obvious  that 
if  we  continue  to  teach  pupils  in  classes  we  must  so 
classify  them  that  those  of  approximately  the  same  men- 
tal ability  will  be  together.  Experiments  made  seem  to 
indicate  that  approximately  25  per  cent  of  the  pupils  in 
any  grade  belonged  mentally  to  a  lower  grade  and  about 
25  per  cent  belonged  to  a  higher  grade.  If  the  teacher 
does  justice  to  the  upper  25  per  cent,  he  is  over-teaching 
the  other  75  per  cent,  or  if  he  addresses  the  average 
pupils,  he  is  under-teaching  the  upper  25  per  cent  and 
over-teaching  the  lower  25  per  cent.  Fortunately,  we  are 
developing  methods  for  the  reclassification  of  pupils 
which  will  make  the  pupils  in  the  various  classes 
more  nearly  homogeneous.  Both  intelligence  and  school 
achievement  tests  are  employed  in  this  reclassification. 
Franzen,  McCall,  and  others  are  making  use  of  the  edu- 
cational quotient  (E.  Q.,  educational  age  divided  by 
chronological  age)  as  a  means  of  reclassifying  students 
and  measuring  their  school  progress. 

In  order  to  compute  a  child's  educational  age  in  any 
subject,  it  is  necessary  to  give  a  series  of  tests  to  a  large 
number  of  pupils  and  determine  the  norms  for  children 
of  different  ages  in  that  subject.  For  example,  suppose 
an  additional  test  were  given  to  a  large  number  of  chil- 
dren ranging  in  age  from  eight  to  fourteen  years.  Sup- 
pose further  that  the  average  number  of  problems  solved 
by  children  chronologically  eight  years  old  was  four; 
those  eight  years,  six  months  old,  six;  those  nine  years 
old,  eight;  those  nine  years,  six  months  old,  ten,  and  so 
on.  Then  a  child  doing  six  problems  would  be  considered 
eight  years,  six  months  old  educationally  in  that  test, 
irrespective  of  his  chronological  age.  If  a  great  number 
of  tests  were  given  to  a  child  in  the  subject  of  arithmetic 
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and  a  composite  score  were  computed  and  compared  with 
the  norms  for  children  of  the  various  ages,  a  child's  educa- 
tional age  in  arithmetic  might  thus  be  determined.  In  a 
similar  way  a  child's  educational  age  may  be  computed 
in  other  subjects. 

Educational  Age  and  Mental  Age  Compared. — It  is 
clear  that  educational  age  would  be  a  better  measure  for 
the  classification  of  pupils  than  mental  age  because  the 
former  represents  the  actual  accomplishment  of  the  pupil 
— the  pupil's  educational  status — while  the  latter  simply 
shows  potential  ability.  The  factors  determining  educa- 
tional age  are  both  hereditary  and  environmental.  At 
the  present  time  we  have  fairly  well-established  grade 
norms  in  the  various  subjects  but  do  not  have  educational 
age  norms.  Suppose  educational  age  norms  were  es- 
tablished, how  could  we  utilize  them  in  the  classification 
of  pupils  ?  In  what  ways  are  they  superior  to  mental  ages  ? 
If  age  norms  are  used,  it  will  be  possible  to  reclassify 
pupils  and  put  each  one  on  his  proper  educational  level. 
When  the  educational  age  is  used  as  a  basis  of  promotion 
or  demotion,  it  takes  cognizance,  not  only  of  the  pupil's 
mental  age  which  is  potential,  but  also  what  he  has  ac- 
tually done,  Avhich  is  one  of  the  best  ways  of  prophesying 
of  what  he  is  able  to  do. 

If  educational  age  is  to  be  used  in  the  reclassification 
of  pupils,  how  much  better  than  the  average  of  his  class 
must  a  pupil  be  before  he  is  allowed  to  skip  a  grade  and 
enter  a  higher  class?  Or,  how  much  poorer  than  the 
average  of  his  class  must  he  be  before  it  is  considered 
wise  to  fail  to  promote  him  or  even  to  demote  him?  Here 
both  the  E.  Q.  and  I.  Q.  are  taken  into  consideration.  As 
was  indicated  above,  the  best  way  to  prophesy  what  rate 
of  progress  a  child  can  make  is  to  find  what  rate  he  has 
made.  There  is  a  rather  high  correlation  between  the 
educational  age  of  a  pupil  and  his  mental  age.     Only 
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tentative  answers  can  be  given  to  these  questions  since 
not  enough  work  along  these  lines  has  been  done  to  be 
entirely  sure  as  to  what  is  the  best  thing  to  do.  It  is  the 
opinion  of  some  investigators  that  the  reclassification  of 
pupils  in  a  relatively  small  school  should  be  somewhat 
as  follows:  No  pupil  should  be  allowed  to  skip  a  grade 
unless  his  educational  age  exceeds  the  educational  age 
of  the  lowest  25  per  cent  in  the  grade  to  which  he  pro- 
poses to  enter.  In  the  matter  of  demotion  and  failure  to 
promote,  no  child  shall  be  denied  his  normal  promotion 
nor  be  demoted  unless  his  educational  age  falls  within 
the  lowest  25  per  cent  of  his  class.  These  rules  not  only 
take  cognizance  of  what  the  child  has  done  but  also  his 
potential  power.  They  do  not  mean,  of  course,  that  simply 
because  a  child  belongs  to  the  lowest  25  per  cent  of  his 
class,  he  shall  be  demoted,  or  because  he  exceeds  the 
average  of  his  class,  he  shall  skip  a  grade,  but  they  mean, 
if  a  pupil  were  an  average  pupil  in  class,  that  he  would 
not  be  denied  promotion  nor  be  demoted. 

Just  as  a  mechanic  working  in  a  garage  has  a  rather 
definite  procedure  in  diagnosing  engine  trouble,  so  an 
educational  expert  soon  develops  a  rather  definite  pro- 
cedure in  diagnosing  educational  ills.  When  a  child  fails 
in  promotion,  one  of  the  first  things  is  to  get  his  mental 
age.  There  is  usually  a  substantial  correlation  between 
mental  age  and  school  marks,  and  a  child's  mental  age 
may  throw  much  light  on  his  failure  to  be  promoted. 

Accomplishment  and  Educational  Quotients. — The  ac- 
complishment quotient  (A.  Q.)  of  a  pupil  is  found  by 
dividing  his  educational  age  by  his  mental  age.  It  shows 
the  degree  a  pupil's  actual  progress  approaches  his  poten- 
tial progress.  This  is  perhaps  the  best  measure  we  have 
of  the. instruction  and  the  application  of  the  pupil.  When 
a  report  of  this  kind  is  taken  home,  the  parent  may  form 
some  intelligent  idea  as  to  whether  the  pupil  is  working 
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up  to  his  optimum  capacity.  He  may  find  out  whether 
the  pupil  is  progressing  at  a  rate  normal  for  his  mental 
and  educational  ages.  This  also  becomes  a  protection  to 
teachers.  If  they  can  show  that  all  the  pupils  under  their 
guidance  have  progressed  at  a  rate  normal  to  their  general 
intelligence,  they  have  a  good  defense  for  adverse  criticism 
of  their  teaching. 

Another  unit  of  measure  that  is  being  employed  in 
measuring  educational  processes  and  products  is  the  educa- 
tional quotient  (E.  Q.),  which  is  the  educational  age 
divided  by  the  chronological  age.  It  is  the  division  of  what 
is,  by  what  it  would  be  if  the  pupil  were  normal.  It  gives 
the  percentage  of  normality.  The  quotient  thus  derived 
will  indicate  whether  the  pupil  has  made  normal  progress 
for  his  age  or  whether  he  has  progressed  more  rapidly  or 
slowly  than  the  average.  Indirectly,  at  least,  the  educa- 
tional quotient  throws  some  light  on  the  pupil's  general 
intelligence.  A  high  educational  quotient  usually  signifies 
a  high  intelligence  quotient,  but  not  always.  Well  taught 
pupils  with  a  low  I.  Q.  may  have  relatively  high  educa- 
tional quotients. 

When  pupils  are  classified  according  to  their  native 
capacities  and  educational  ages,  we  may  then  begin  to  judge 
the  quality  of  the  teaching  more  intelligently.  A  reason- 
able increment  or  interest  on  the  mental  capital  invested 
will  be  all  that  is  expected  of  teachers.  Educational  norms 
will  serve  as  goals  for  teachers  and  pupils.  Parents  and 
supervisors  wiU  not  expect  increments  out  of  proportion 
to  the  mental  capital  invested.  On  the  other  hand,  teachers 
will  know  when  pupils  are  doing  an  honest  day's  work, 
that  is,  when  they  are  accomplishing  a  normal  amount  for 
their  general  intelligence  and  educational  level. 

Principles  for  the  Choice  of  Subject-Matter  for  Educa- 
tional Tests  and  Scales.— What  shall  be  the  controlling 
factors  which  will  determine  the  type  of  subject-matter 
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used  in  tests  ?  Many  factors  enter  into  this  problem,  some 
of  the  more  important  of  which  we  shall  discuss  here.  As 
was  discussed  in  the  previous  chapter,  a  scale  is  a  ladder, 
or  a  linear  rule,  extending  from  the  worst  to  the  best, 
from  the  lowest  to  the  highest,  or  from  the  easiest  to  the 
hardest,  and  indicates  the  steps  or  degrees  by  which  inter- 
mediate achievements  may  be  gauged.  Now  it  is  evident 
that  one  making  a  scale  of  progressive  degrees  of  difficulty 
may  seek  subject-matter  with  only  this  idea  in  mind  and 
pay  no  attention  to  the  practical  or  utilitarian  value  of 
the  subject-matter  that  enters  into  the  scale.  This  method 
is  sometimes  called  the  statistical  method  as  opposed  to  the 
analytical  method,  which  does  take  cognizance  of  the  sub- 
ject-matter field  from  which  the  material  is  to  be  chosen. 
In  Woody 's  arithmetic  scales,  for  instance,  he  states  that 
his  "fundamental  idea  was  to  derive  a  series  of  scales  that 
would  indicate  the  type  of  problems  (examples)  and  the 
difficulty  of  the  problems  (examples)  that  a  class  can  solve 
correctly. ' '  ^  This  method  assumes  that  an  example  is 
suitable  for  use  in  a  test  simply  because  it  is  done  cor- 
rectly by  a  gradually  increasing  per  cent  of  pupils  as 
one  proceeds  from  grade  to  grade.  He  selected  his  test 
material  not  on  the  basis  of  its  arithmetical  significance 
but  on  the  basis  of  the  consistency  of  the  pupils'  reactions. 
It  seems  to  the  author  that  his  procedure  is  open  to 
criticism  on  the  ground  that  arithmetic  is  a  tool  subject 
and  there  is  little  use  in  testing  the  ability  of  pupils  to 
use  a  tool  that  they  will  rarely  or  never  be  called  upon 
to  use  in  life  outside  the  school. 

It  seems  that  in  the  tool  subjects,  at  least,  one  must 
determine  the  subject-matter  for  tests  and  scales  very 
largely  on  the  basis  of  its  utilitarian  value. 


^  Measurements  of  Some  Achievements  in  Arithmetic,  Teachers  College 
Contributions  to  Education,  No.  80  (1916),  p.  1. 
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Information  Desired  Determines  the  Type  of  Subject- 
Matter. — Tests  are  of  three  general  types,  difficulty  tests, 
speed  tests,  and  quality  tests.  The  kind  of  subject-matter 
used  in  the  test  will  be  determined  by  the  kind  of  in- 
formation desired.  If  a  teacher  wants  to  know  how 
rapidly  a  pupil  can  do  arithmetic,  for  instance,  the  prob- 
lem is  one  of  choosing  subject-matter  of  a  constant  level 
of  difficulty  and  of  uniform  quality,  which  is  that  quality 
which  the  public  has  been  accustomed  to  call  rigJit.  The 
subject-matter  should  be  chosen  from  fields  that  supply 
the  tools  for  doing  society's  work  in  arithmetic.  The  rate 
test  will  differ  from  the  difficulty  test  since  in  the  former 
but  one  type  of  subject-matter  is  usually  chosen,  while 
the  difficulty  test  usually  includes  several  types. 

Some  Characteristics  of  an  Ideal  Educational  Scale 

An  ideal  educational  scale  must  have  at  least  the  follow- 
ing characteristics:  (1)  it  must  have  an  accurately  defined 
zero  point;  (2)  the  steps  above  the  zero  point  must  be  of 
equal  magnitude;  (3)  the  scale  must  measure  the  desired 
educational  product;  (4)  it  must  be  so  simple  in  its  ap- 
plication that  it  is  adapted  to  the  classroom;  (5)  it  must 
not  require  an  undue  amount  of  time  in  administration. 
We  shall  note  briefly  the  significance  of  each  of  these 
characteristics. 

1.  The  Establishment  of  a  Zero  Point. — In  education 
as  in  the  physical  sciences  two  positions  may  be  taken 
as  the  zero  point  on  the  scale.  The  first  may  be  called  the 
absolute  zero,  which  means  just  not  any  of  the  thing  in 
question.  In  measuring  heat,  for  instance,  absolute  zero 
is  273  degrees  below  the  point  we  ordinarily  call  zero  on 
the  centigrade  scale  and  means  that  all  the  heat  has  passed 
out  of  the  thing  in  question.  Establishing  an  absolute 
zero  point  in  education  is  a  difficult  thing  to  do.    The  zero 
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point  on  the  Hillegas  Composition  Scale  was  determined 
by  40  professors  of  English,  editorial  writers,  psychologists, 
and  educational  experts.  The  composition  with  zero  merit 
reads  as  follows: 

Dear  Sir:  I  write  to  say  it  aint  a  square  deal.  Schools  is  I 
say  they  is  I  went  to  a  school.  Red  gree  gxeen  and  brown  aint  it 
hit  to  a  bit  I  say  he  don't  know  his  business  not  today  not 
yesterday  and  you  know  it  and  I  want  Jennie  to  get  me  out. 

That  is  probably  a  little  better  than  zero.  There  is  a  little 
merit  concealed  in  it  but  not  enough  probably  to  prejudice 
the  matter  seriously.  The  fact  that  zero  points  do  not 
stare  us  in  the  face  in  the  case  of  mathematical  originality, 
or  knowledge  of  German,  or  ability  in  writing  as  they  do 
in  the  case  of  measures  of  length,  and  weight,  and  time, 
is  no  excuse  for  not  trying  to  get  them.  If  we  get  scale 
points  defined  and  their  distances  defined  and  established 
in  reference  to  an  absolute  zero,  there  is  no  further  diffi- 
culty in  constructing  a  scale  to  measure  mental  achieve- 
ments. Such  scales  have  every  logical  qualification  that 
any  of  the  scales  in  the  physical  science  have. 

Zero  points  on  scales  are  imperfectly  known,  and  as  a 
result  we  add  and  subtract  educational  quantities  with 
much  less  precision  than  desirable.  We  cannot  say  that  one 
product  is  twice  as  good  as  another,  or  one  task  twice  as 
hard  as  another,  or  that  one  improvement  is  twice  as  great 
as  another  unless  we  establish  a  zero  point.  Statements 
of  these  kinds  are  intricate  and  subtle  matters  involving 
presuppositions  which  must  be  kept  in  mind. 

The  ordinary  scale  for  weight  exemplifies  an  ideal  scale 
in  four  respects.  First,  it  is  a  series  of  perfectly  definable 
facts.  All  men  the  world  over  know  exactly  what  is  meant 
by  two  grams,  four  grams,  etc.  In  the  second  place,  each 
amount  is  a  definite  amount  of  the  same  kind  of  thing.  In 
the  third  place,  the  difference  between  any  two  of  the 
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amounts  is  perfectly  defined  in  terms  of  some  unit  of  dif- 
ference. The  step  from  four  to  five  grams  is  the  same  as 
from  six  to  seven.  Lastly  the  zero  point  of  the  scale  is 
absolute.  That  is,  it  means  just  barely  not  any  of  the  thing 
in  question. 

Thorndike  acquired  an  actual  zero  produced  by  a  human 
being  in  penmanship.  He  has  a  signature  of  a  letter  which 
cannot  be  read  in  toto  and  in  which  no  letter  can  be  read 
by  any  one  of  hundreds  who  have  tried.  He  defined  zero 
as  the  suppositional  handwriting  such  that,  though  recog- 
nizable as  handwriting,  it  has  no  legibility,  no  beauty,  no 
value  as  penmanship.  When  we  used  zero  in  education  in 
the  past,  we  usually  have  had  in  mind  a  relative  zero.  The 
value  of  this  zero  is  usually  a  subjective  one,  hence,  ill- 
adapted  to  measure  educational  products. 

Education  is  having  the  same  difficulty  that  the  natural 
sciences  pass  through  in  the  standardization  of  their  units. 
In  the  matter  of  recording  temperature,  for  instance,  scien- 
tific progress  was  handicapped  by  the  fact  that  different 
individuals  were  using  different  reference  points  when 
measuring  temperature.  Finally,  after  long  and  costly 
delaj^s,  the  methods  of  measuring  temperature  was  reduced 
to  two  competing  systems,  one  of  which  took  the  freezing 
point  of  water  as  the  point  of  reference  and  the  other  took 
a  point  32  degrees  below  that  point.  In  measuring  the 
height  of  land  forms  scientists  agreed  to  take  sea  level  as 
the  point  of  reference.  Other  things  might  have  been  taken 
but  they  would  not  have  been  so  universally  applicable, 
and,  if  a  number  of  different  things  had  been  taken,  much 
time  and  labor  would  have  been  needed  to  convert  one  unit 
into  another  in  order  that  comparisons  might  be  made. 

In  education  the  tendency  has  been  to  search  for  some 
absolute  zero  point  for  the  trait  being  measured.  This  is 
obviously  a  difficult  task  and  results  in  much  confusion, 
even  assuming  that  it  can  be  scientifically  determined,  be- 
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cause  each  particular  test  will  have  its  own  zero  point. 
Many  methods  are  used  in  the  location  of  these  zero  points 
and  points  of  reference.  McCalP  mentions  six  methods 
of  locating  the  zero  points  in  scales  now  extant:  (1)  the 
reference  point  on  unsealed  tests  is  just  no  score  on  the 
material  of  the  particular  test ;  (2)  the  zero  point  is  guessed 
at  by  the  author  of  the  scale;  (3)  the  reference  point  on 
judgment  scales  is  the  median  judgment  of  judges  as  to 
the  location  of  zero  merit  in  composition,  handwriting,  and 
the  like,  as  in  the  cases  cited  above ;  (4)  the  zero  point  is 
located  by  the  use  of  the  per  cent  of  pupils  in  some  early 
grade  who  make  no  score  on  very  easy  material;  (5)  the 
reference  point  for  other  scales  is  three  times  the  standard 
deviation'^  below  the  mean  of  the  group  for  whom  the  test 
was  devised;  (6)  the  reference  point  is  simply  the  lowest 
score  made.  There  are  still  other  methods  for  locating 
points  of  reference  in  scale  building. 

Since  there  is  a  lack  of  agreement  as  to  just  what  the 
zero  point  in  reading  and  other  subjects  should  be,  McCall 
proposes  to  take  as  the  reference  point  for  school  achieve- 
ment tests,  not  a  zero  point,  but  the  mean  performance  of 
children  between  the  ages»  12  and  13  years.  He  thinks  that 
such  a  point  could  be  used  for  any  mental  trait  regardless 
of  the  location  of  its  absolute  zero,  if  such  there  be.^  He 
would  then  measure  the  school  achievements  of  children  in 
other  grades  in  terms  of  the  12-year-old  children.  The 
grades  ranged  from  5  S.  D.  (standard  deviation)  below 
the  mean  score  of  the  12-year-olds,  the  mathematical  zero 
of  the  scale,  to  5  S.  D.  above  the  mean  score  of  the  12-year- 
olds.     Each  S.  D.  was  divided  into  10  units,  making  the 


» "Proposed    Uniform    Method    of   Scale    Construction,"    Teachers 
College  Record,  Vol.  22,  No.  1,  Jan.,  1921,  pp.  31-51. 
'  See  Chapter  X  for  a  definition  of  standard  deviation. 
Wp.cit.,  p.  43. 
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entire  range  from  0  to  100  with  50  as  the  mean  score  of 
the  12-year-old  children. 

By  this  method  any  pupil  who  makes  a  score  of  50  has 
an  ability  equal  to  the  mean  ability  of  the  12-year-old 
children,  and  a  pupil  who  makes  a  score  of  40  has  an 
ability  of  10  units,  or  one  S.  D.  below  the  mean  ability  of 
12-year-olds.  A  pupil  with  a  score  of  75  is  2.5  S.  D.  above 
the  mean  ability  of  12-year-olds  and  so  on, 

McCall  gives  four  reasons  why  the  mathematical  zero  is 
located  5  S.  D.  below  the  mean  instead  of  at  the  mean: 
(1)  this  procedure  eliminates  cumbersome  plus  and  minus 
signs;  (2)  it  forms  a  convenient  range  of  points  between 
1  and  100  with  the  reference  point  at  the  easily  remem- 
bered 50;  (3)  this  procedure  carries  the  scale  down  and 
up  as  far  as  any  one  will  need  to  go;  (4)  it  gives  a  mathe- 
matical zero  which  is  close  to  the  supposed  absolute  zeros 
for  reading,  spelling,  writing,  composition,  completion,  and 
other  typical  mental  functions. 

One  objection  made  to  this  scale  is  that  the  "times  state- 
ment" cannot  be  employed  in  dealing  with  mental  traits 
because  the  absolute  zero  point  is  not  found.  That  is,  one 
cannot  say  that  John  has  three  times  the  ability  of  James 
in  a  certain  subject  unless  the  absolute  zero  point  of  ability 
is  determined.  Nevertheless,  5  S.  D.  below  the  reference 
point  cited  above  gives  a  mathematical  zero  that  cor- 
responds reasonably  well  with  the  absolute  zeros  deter- 
mined in  most  scale  making.  And,  when  all  is  considered, 
the  best  way  to  appreciate  ability  of  an  individual  is  to 
refer  him  to  the  mean  ability  of  his  own  or  some  standard 
group. 

Another  objection  urged  against  this  point  of  reference 
is  that  any  score  above  or  below  this  point  would  not 
indicate  whether  the  pupil  possessed  much  or  little  of  a 
particular  trait,  because  a  12-year-old  pupil  might  possess 
little  of  a  certain  trait  and  relatively  much  of  another. 


SCHOOL  ACHIEVEMENT  TESTS  201 

This  defect  is  to  be  remedied  by  the  use  of  the  absolute 
zero  of  the  trait,  provided  such  can  be  found. 

The  time  of  birth  is  used  in  most  scales  for  measuring 
general  intelligence  as  the  point  of  reference.  That  is,  in 
[attempting  to  measure  the  general  intelligence  of  an  in- 
j  dividual,  his  score  is  measured  in  terms  of  years  and  months 
[of  mental  age. 

j  McCall  would  not  only  standardize  the  points  of  refer- 
ence in  mental  measurements  but  he  would  also  standardize 
the  units,  and  in  each  case  would  use  some  function  of 
the  variability  of  12-year-old  children,  preferably  the 
standard  deviation.  Thorndike  and  his  students  have  con- 
stantly used  some  function  of  variability  as  the  unit  of 
measure. 

The  Binet-Simon  Scale  revised  by  Terman  has  met  the 
qualifications  for  scientific  scale  building  in  that  it  has  a 
definite  tangible  point  of  reference,  the  time  of  birth;  it 
is  simple  and  objective  and  is  easily  understood.  But  it 
fails  to  meet  one  condition  for  an  ideal  scale  in  that  the 
units  of  the  scale  are  not  equal.  The  development  of 
general  intelligence  is  measured  in  years  and  months.  The 
interval  between  eight  and  nine  years,  for  instance,  is 
larger  than  the  interval  between  14  and  15.  In  certain 
traits  the  unit  above  the  age  of  16  becomes  zero.  Because 
of  the  effects  of  social  conditions,  it  becomes  difficult  to 
build  up  a  scale  on  the  age  basis  which  will  measure  satis- 
factorily the  general  intelligence  of  pupils  below  the  age 
of  eight  and  above  the  age  of  12.  Hence,  a  scale  of  this 
kind  cannot  satisfactorily  score  pupils  with  exceptionally 
low  or  with  exceptionally  great  ability.  Judgment  scales 
discussed  above  may  be  converted  into  scales  of  this  kind 
thereby  making  aU  scales  performance  scales.^ 

'  For  a  more  elaborate  discussion  of  the  T  scale  the  reader  is  referred 
to  the  "Proposed  Uniform  Method  of  Scale  Construction,"  by  McCall, 
Teachers  College  Record,  Vol.  22,  Jan.,  1921. 
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2.  Making  the  Steps  of  Equal  Magnitude. — Three 
methods  are  in  common  use  in  making  the  steps  of  a  scale 
of  equal  magnitude. 

(a)  The  Method  by  Competent  Judges. — Many  things  in 
education  must  of  necessity  be  left  to  the  consensus  of 
opinion  of  competent  judges.  The  relative  merits  of  two 
drawings,  for  instance,  must  always  be  determined  in  this 
way.  Or,  if  we  do  not  use  the  functional  method  in  hand- 
writing, the  relative  merits  of  the  various  samples  to  be 
measured  may  be  determined  by  competent  judges.  By 
this  method  we  may  say  that  differences  in  merit  between 
samples  of  handwriting,  for  instance,  are  equal  when  they 
are  noticed  by  an  equal  number  of  competent  judges.  For 
example,  if  we  had  1,000  of  the  best  judges  in  handwriting 
in  the  world,  and  750  of  them  were  to  judge  a  certain 
sample  of  handwriting  designated  by  the  figure  10  as  better 
than  another  sample  designated  as  9,  and  the  same  number 
would  say  that  sample  number  11  was  better  than  sample 
number  10,  then  we  might  say  that  sample  10  is  as  much 
better  than  9  as  11  is  better  than  10  because  the  differences 
were  noted  by  an  equal  number  of  competent  judges. 

This  method  rests  on  the  fundamental  assertion  that 
equally-often-noted  differences  are  equal.  Owing  to  the 
nature  of  education,  it  is  probable  that  a  large  part  of  the 
pedagogical  scales  of  the  future  will  be  based  on  the  con- 
sensus of  judgments  by  competent  judges.  In  drawing, 
English  composition,  music,  handwriting,  and  such  sub- 
jects, it  is  practically  impossible  to  measure  the  results  save 
by  means  of  scales  thus  designed. 

It  should  be  noted  in  passing  that  the  whole  theory  of 
scale  development  may  be  classified  under  two  general 
methods:  the  judgment  method,  as  the  Thorndike  and 
Hillagas  scales,  and  the  ratio  method,  followed  by  Ayres 
in  making  a  spelling  scale,  and  by  Trabue  in  making  a 
language  scale.     It  may  readily  be  shown  that  the  school 
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subjects  are  susceptible  of  arrangement  in  a  certain  serial 
order  which  will  indicate  the  method  of  scale  derivation 
applicable  to  them.  At  one  end  of  the  series  will  be  such 
subjects  as  spelling  and  arithmetic,  which  lend  themselves 
to  the  ratio  method,  that  is,  to  the  expression  of  the  rela- 
tion between  the  actual  number  of  correct  responses  and 
the  possible  number  of  correct  responses.  At  the  other 
end  of  the  series  are  such  subjects  as  composition  and  pen- 
manship. Scales  for  these  subjects  must  be  derived  by  the 
judgment  method.  One  sample  of  handwriting  is  better 
than  another  not  as  a  fact  but  as  an  opinion.  A  specimen 
of  English  writing  is  better  than  another  precisely  because 
competent  judges  think  it  is  better.  Between  the  two  ex- 
tremes of  school  subjects  are  a  number  of  subjects  such 
as  history,  geography,  and  literature,  which  may  be  meas- 
ured by  either  method. 

(6)  By  the  Functional  MetTiod. — By  this  method  the 
quality  of  the  thing  is  not  measured  directly,  but  indirectly 
by  measuring  the  degree  to  which  the  particular  thing 
or  product  functions.  The  Gettysburg  Handwriting  Scale 
is  an  example  of  this  kind.  By  this  method  two  samples 
of  handwriting  were  said  to  be  equally  legible  if  readers 
could  read  one  sample  as  rapidly  as  the  other.  The 
method  of  making  the  steps  equal  is  as  follows:  Suppose 
three  samples  of  handwriting  are  being  considered,  of 
which  expert  readers  can  read  sample  A  at  the  rate  of 
100  words  per  minute,  sample  B  at  the  rate  of  120  words 
per  minute,  and  sample  C  at  the  rate  of  140  words  per 
minute ;  then  we  may  say  that  sample  B  is  as  much  better 
than  sample  A  as  sample  C  is  better  than  sample  B,  and 
that  therefore  the  steps  are  equal. 

Measures  in  the  physical  sciences  are  quite  often  made 
not  by  measuring  the  thing  directly  but  by  measuring  some- 
thing that  varies  with  it.  For  instance,  we  do  not  measure 
heat  directly  but  measure  the  length  of  a  mercury  column 
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which  we  know  varies  directly  with  the  amouni;  of  heat 
in  the  body.  By  this  method  we  know  that  the  amount 
of  heat  that  it  takes  to  raise  a  column  of  mercury  from 
a  point  marked  10  on  the  scale,  for  instance,  to  one  marked 
11,  is  the  same,  approximately  at  least,  as  the  amount  of 
heat  that  it  takes  to  raise  the  mercury  column  from  11  to 
12  or  from  15  to  16.  By  the  same  principle,  Ayres  es- 
tablished approximately  equal  steps  on  his  writing  scale 
by  assuming  that  progressive  degrees  of  merit  varied  di- 
rectly with  the  * '  readability, ' '  or  the  speed  with  which  the 
various  samples  might  be  read.  To  the  physicist  those 
differences  are  equal  which  are  produced  by  the  same  cause 
under  the  same  circumstances,  or  under  which  the  same 
conditions  produce  the  same  effect. 

For  a  long  time  we  measured  fatigue  by  measuring  the 
distance  between  the  points  of  the  aesthesiometer  when 
placed  on  the  skin  and,  although  the  correspondence  be- 
tween cutaneous  insensitivity  and  fatigue  has  been  more 
or  less  discredited,  it  is  not  discredited  on  the  ground  that 
the  fatigue  element  could  not  be  measured  in  this  way, 
provided  there  is  a  correspondence. 

Perhaps  it  should  be  noted  in  passing  that  none  of  these 
scales  approaches  perfection.  They  are  still  crude,  but 
much  better  than  the  old  methods  based  on  personal 
opinion. 

The  Ayres  Gettysburg  Handwriting  Scale  has  been 
criticized  especially  on  the  ground  that  in  making  the  scale 
the  different  qualities  of  handwriting  were  determined  by 
the  rapidity  with  which  the  samples  might  be  read,  but 
when  the  scale  is  used  in  the  school  room  the  merit  of  a 
particular  sample  is  determined  not  by  how  rapidly  it  may 
be  read  but  by  the  nearness  to  which  it  approaches  in  form 
and  general  appearance  a  sample  the  readability  of  which 
is  known.  That  is,  the  scale  was  made  from  a  functional 
standpoint,  the  speed  with  which  the  various  samples  might 
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be  read;  but  wben  a  sample  is  to  be  measured,  its  quality 
is  determined  not  by  the  speed  by  which  it  may  be  read 
but  by  the  nearness  it  approaches  in  form  and  general 
appearance  to  a  sample  on  the  scale. 

(c)  The  Proportion-of -Pupils-Solving  Method. — In  a 
spelling  test,  for  instance,  the  words  may  be  ranked  accord- 
ing to  spelling  difficulty  by  determining  the  number  of 
children  that  fail  to  spell  them.  For  example,  if  three  of 
the  words  are  home,  church,  and  separate,  and  95  per  cent 
of  the  children  are  able  to  spell  the  word,  home,  90  per 
cent  are  able  to  spell  the  word,  church,  and  85  per  cent 
are  able  to  spell  the  word,  separate,  we  may  be  assured 
that  the  words  are  arranged  according  to  their  spelling 
difficulty  from  the  easiest  to  the  most  difficult.  It  does 
not  follow,  however,  that  since  5  per  cent  fewer  pupils 
were  able  to  spell  the  word,  church,  than  were  able  to 
spell  the  word,  home,  and  since  5  per  cent  fewer  were  able 
to  spell  the  word,  separate,  than  the  word,  church,  that  the 
increase  in  spelling  difficulty  from  home  to  church  equals 
the  increase  from  church  to  separate. 

In  order  to  make  the  steps  of  equal  difficulty,  the  normal 
distribution  curve  is  brought  into  use.  We  may  illustrate 
how  the  steps  are  made  equal,  or  approximately  so,  by 
reference  to  the  headings  of  the  Ayres  Spelling  Scale, 
Figure  IV.  Dr.  Ayres  divided  the  words  in  his  spelling 
scale  into  26  divisions  or  columns.  The  words  in  each 
column  are  of  approximately  equal  spelling  difficulty,  and 
the  steps  in  spelling  difficult}''  from  each  column  to  the 
next  are  approximately  equal.  The  figures  at  the  top  of 
the  scale  indicate  the  approximate  average  scores  of  correct 
spellings  that  may  be  expected  among  children  of  the  dif- 
ferent grades  and  of  the  same  grade.  Thus,  in  column  K, 
for  instance,  the  58  at  the  head  of  the  column  is  the  average 
score  that  should  be  expected  from  second-grade  pupils 
attempting  to  spell  the  words  in  this  column.    The  average 
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score  for  third-grade  pupils  is  79,  for  fourth-grade  pupils, 
92,  and  so  on.  The  numbers  99,  98,  96,  94,  92,  etc.,  at  the 
top  in  Figure  IV  for  the  second  grade  are  as  near  the 
mid-points  of  the  equal  steps  as  can  be  obtained  without 
using  fractions.  That  is,  the  step  from  column  A  to  column 
B  equals  the  step  from  column  B  to  column  C,  and  so  on. 

In  making  the  spelling  scale  he  assumed  that  the  spelling 
ability  in  any  one  grade  is  distributed  according  to  the 
normal  probability  surface. 


Figure  V. 


ILLUSTRATING    THE    DISTRIBUTION    OP    SPELLING    ABILITIES 

{adapted  from  Ayres) 


That  is,  taking  any  grade,  as  the  third,  for  instance,  and 
representing  it  by  a  normal  distribution  curve  as  in  Figure 
V,  we  note  that  at  the  extreme  left,  the  curve  is  very  near 
the  base  line  which  indicates  there  are  very  few  excep- 
tionally poor  spellers.  In  the  middle,  the  curve  is  the 
greatest  distance  from  the  base  line  thus  representing  a 
large  proportion  of  medium  spellers.  The  median  line,  in 
Figure  V,  represents  the  50  per  cent  in  the  third  grade. 
Figure  IV.  The  horizontal  line,  xy,  from  the  median  to 
the  curve  represents  1  sigma  distance  (sigma,  o- ,  is  the  unit 
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for  standard  deviation)  and  intersects  the  curve  at  a  point 
at  which  it  changes  from  convex  to  concave. 

This  distance  is  always  a  constant  function  of  the  curve 
of  normal  distribution  and  in  the  Ayres  study  was  chosen 
as  the  unit  of  measure  along  the  base  line.  He  laid  off 
on  the  base  line  to  the  left  of  the  line,  MN,  a  distance 
equal  to  2.5  sigma.  The  part  from  iV  to  B  is  0.5  sigma, 
from  B  to  D  and  from  I>  to  P  is  1  sigma  each.  Since  the 
curve  does  not  meet  the  base  line  at  2.5  sigma  from  A',  but 
theoretically  meets  it  only  in  infinity,  we  may  assume  that 
since  only  0.62  per  cent  of  the  area  between  the  curve  and 
the  base  line  lies  to  the  left  of  point  P,  that  for  practical 
purposes  it  meets  the  base  line  at  2.5  sigma  from  N  and 
this  point  may  be  considered  as  zero. 

Ayres  thus  divided  the  base  line  into  five  equal  parts. 
He  called  the  extremes  from  left  to  right,  0  and  100  respec- 
tively. Assuming,  as  he  did,  that  the  entire  frequency 
might  be  included  between  points  2.5  sigma  to  the  left  of 
N  and  2.5  sigma  to  the  right  of  N,  he  found  that  7  per 
cent  of  the  entire  area  between  the  curve  and  the  base  line 
lies  to  the  left  of  the  line  erected  at  point  D.  Between  this 
line  and  the  perpendiculars  to  the  base  line  at  point  B  are 
24  per  cent  of  the  cases.  Between  the  lines  erected  at 
points  B  and  A  are  38  per  cent  of  the  cases;  between  A 
and  C  are  24  per  cent  of  the  cases ;  and  between  C  and  Q 
are  7  per  cent  of  the  cases.  In  scaling  the  words  he  found 
that  7  third-grade  children  out  of  100  failed  to  spell  the 
word  *'has,"  while  93  per  cent  spelled  it  correctly. 

Applying  this  fact  to  the  curve,  the  word  ''has"  would 
be  located  at  point  20  on  the  base  line  which  would  have 
7  per  cent  of  the  cases  to  the  left  of  it  and  93  per  cent 
at  the  right.  In  a  similar  way,  a  word  missed  by  31  per 
cent  of  the  children  would  be  located  at  point  B  on  the 
base  line.  By  the  same  method  all  the  words  were  thus 
located  in  the  spelling  scale. 
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By  dividing  each  of  the  five  divisions  on  the  base  line 
into  five  equal  parts,  Dr.  Ayres  made  a  total  of  25  steps 
ranging  from  0,  or  near  0,  to  100  in  his  spelling  scale.  The 
average  of  all  the  values  that  might  theoretically  be  con- 
tained in  each  of  these  25  steps  has  thus  been  determined 
to  the  nearest  whole  number  and  this  value  has  been  as- 
signed to  the  step.  These  25  values  are  100,  99,  98,  96,  94, 
92,  88,  84,  79,  73,  66,  58,  50,  42,  34,  27,  21,  16,  12,  8,  6,  4, 
2,  1,  0.    The  limits  of  these  values  are  as  follows: 
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Noting  the  scores  100,  99,  98,  96,  94,  etc.,  in  the  top  row 
and  at  the  left  in  Figure  IV,  we  see  that  the  steps  do  not 
appear  to  be  equal  since  they  are  1,  1,  2,  2,  2,  4,  etc.  In 
order  to  make  the  steps  of  spelling  difficulty  equal  from 
one  column  to  another  the  average  scores  are  computed  in 
per  cent  and  transmuted  into  units  of  standard  deviation. 
This  may  be  done  by  referring  to  tables  for  converting 
the  per  cent  failing  into  units  of  standard  deviation. 
We  shall  note  more  specifically  the  curve  in  Figure  V  to 
illustrate  two  measures  of  deviation  that  are  used  in  mak- 
ing the  steps  equal.  Theoretically,  the  curve  reaches  the 
base  line  only  in  infinity.  The  area  between  the  curve  and 
the  base  line  is  divided  into  10,000  equal  parts  and  each 
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particular  section  carefully  mapped  out  so  that  it  is  known 
exactly  where  lines  perpendicular  to  the  base  line  must 
be  erected  on  each  side  of  the  line  MN  to  include  the  middle 
half  of  the  area  of  the  figure,  or  any  other  fraction  of  it. 
When  lines  are  erected  at  equal  distances  from  the  line  MN 
so  that  the  area  between  the  curve,  the  base  line,  and  these 
two  perpendicular  lines  is  equal  to  one-half  the  total  area, 
the  distance  of  these  perpendicular  lines  from  the  line  MN  is 
known  as  the  prohahle  error.  In  other  words,  the  probable 
error  is  a  distance  on  either  side  of  the  measure  of  central 
tendency  {MN)  that  wiU  include  the  middle  half  of  the 
measures.  It  is  evident  that  the  line  FQ  may  be  divided 
into  any  desired  units  of  probable  error  (or  P.  E.,  as  the 
units  are  usually  designated).  It  is  also  evident  that  a 
unit's  distance  on  the  line  PQ  near  the  central  part  of 
the  figure  would  include  a  much  larger  area  than  the  same 
linear  unit  would  out  near  the  ends  of  the  curve.  The  area 
included  between  two  perpendicular  lines  erected  at  units 
distance  apart  at  some  column  as  N,  Figure  IV,  for  in- 
stance, may  be  eight  or  ten  times  as  great  as  the  area  in- 
cluded between  two  lines  similarily  drawn  at  or  near 
columns  Y  or  Z. 

Lines  erected  at  equal  distances  from  the  line  MN  so  as 
to  include  the  middle  68.26  per  cent  of  the  area  between 
the  base  line  and  the  curve  are  said  to  be  erected  at  a  dis- 
tance of  one  sigma  (  o-)  from  the  line  MN.  Sigma  {a) 
represents  the  standard  deviation.  It  is  evident,  therefore, 
that  the  line  PQ  may  be  divided  into  any  number  of  equal 
I)arts,  each  part  being  represented  by  sigma  or  a  fraction 
thereof.  As  in  the  case  of  P.  E.,  two  lines  erected  per- 
pendicular to  the  base  line  near  the  center  of  the  curve 
and  at  sigma 's  distance  apart  would  include  a  far  greater 
per  cent  of  the  area  than  if  erected  at  sigma 's  distance 
apart  but  near  the  end  of  the  curve. 

Now  if  one  transmutes  the  per  cents  given  at  the  head 
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of  each  column  in  the  Ay  res  Spelling  Scale  for  any  par- 
ticular grade,  as  for  instance  the  third,  into  units  of 
standard  deviation,  or  sigma,  it  will  be  found  that  in 
passing  from  left  to  right  the  words  grow  progressively 
more  difficult  by  approximately  equal  steps,  and  that  the 
size  of  the  steps  is  0.2  sigma.  Standard  deviation  is  dis- 
cussed more  fully  in  Chapter  X. 

3.  The  Scale  Must  Measure  the  Desired  Educational 
Product. — It  is  not  always  easy  to  tell  just  what  a  test 
measures  after  it  is  given.  For  instance,  it  is  very  doubt- 
ful just  what  the  Trabue  language  tests  measure.  They  may 
measure  general  intelligence  or  language  ability  or  both. 
The  scores  are  very  difficult  to  interpret.  Suppose  a  pupil 
makes  a  poor  score  on  these  tests.  What  shall  the  teacher 
or  the  administrator  do  as  a  result  of  it?  Because  of 
their  general  nature,  little  guidance  comes  to  the  teacher 
from  tests  of  this  kind.  On  the  other  hand,  if  a  pupil 
makes  a  low  score  on  the  Charters  pronoun  tests,  the 
teacher  knows  immediately  where  more  drill  work  should 
be  done.  It  gives  her  a  point  of  departure.  She  knows 
that  she  has  tested  the  pupil  in  a  specific  field  and  knows 
his  weak  points.  When  scores  are  a  result  of  many 
variables  it  is  difficult  to  tell  just  what  the  test  has  really 
measured.  Care  should  be  taken,  therefore,  to  see  that 
the  test  really  measures  what  it  is  designed  to  measure. 

4.  The  Test  Must  Be  so  Simple  in  Its  Application  that 
It  Is  Adapted  to  the  Classroom. — The  only  way  that  it  is 
possible  to  determine  the  intellectual  achievement  of  a  pupil 
is  by  what  he  does.  In  school  achievement  tests,  pupils 
are  asked  to  do  a  great  many  things,  and  it  is  evident 
that  if  the  directions  for  giving  the  tests  cannot  be  well 
understood  by  the  pupil,  the  product  will  not  be  a  correct 
measure  of  the  pupil's  ability.  Any  one  who  has  given 
tests  knows  that  even  when  the  instructions  are  so  simple 
that  it  would  seem  almost  impossible  for  a  pupil  "f o  mis- 
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understand  them,  yet,  in  a  class  of  40  pupils  there 
will  be  one  or  two  who  do  not  know  what  the  test  calls 
for. 

The  ideal  test  must  be  of  such  a  nature  that  the  record- 
ing of  answers,  or  the  execution  of  the  design,  if  a  draw- 
ing is  called  for,  may  be  done  with  the  minimum  amount 
of  time  and  energy,  so  that  practically  all  the  effort  will 
go  towards  the  solution  of  the  questions  or  problems 
rather  than  recording  the  answers  when  once  worked  out 
and  made. 

5.  Tests  Must  Not  Require  an  Undue  Amount  of  Time 
in  Administration. — Tests  that  require  an  undue  amount 
of  time  in  administration  will  be  neither  popular  nor 
practical.  Both  teachers  and  pupils  dislike  tests  that 
require  a  long  time  to  do  them  and  that  require  much 
writing  to  be  done.  In  tests  of  this  kind  much  energy 
is  wasted  in  recording  the  answers  after  the  problems 
have  been  solved  mentally  or  after  the  correct  answers 
have  been  determined  to  questions.  Furthermore,  the 
labor  in  scoring  is  so  great  that  too  much  of  the  teacher's 
time  must  be  spent  in  reading  the  papers.  It  isn't  always 
easy  to  design  a  test  that  will  satisfy  these  conditions, 
but  the  popularity  of  the  test  will  depend  very  largely 
on  these  two  points.  The  mechanical  phases  of  test  mak- 
ing are  as  important  from  the  standpoint  of  administration 
as  are  the  thought  phases.  Some  of  the  most  scientifically 
constructed  tests  we  have,  from  the  standpoint  of  ac- 
curately measuring  the  educational  processes  and  prod- 
ucts, are  so  mechanically  clumsy  that  teachers  dislike 
to  use  them.  It  may  even  be  desirable  that  something  be 
sacrificed  in  accuracy  in  order  to  increase  the  practicality 
of  the  tests.  This  is  safe,  however,  only  in  so  far  as  the 
net  gain  is  greater  than  if  most  of  the  emphasis  were 
placed  on  accuracy.  The  point  may  be  illustrated  from 
an  example  in  the  business  world.    From  the  standpoint 
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of  accuracy  the  grocer  might  obtain  a  pair  of  scales  that 
would  measure  sugar  to  one-one-hundredth  of  an  ounce; 
but,  practically,  the  net  gain  for  both  customer  and  grocer 
would  be  less  than  if  he  measured  much  less  accurately. 


CHAPTER  Vn 

SCORING  THE  TESTS  AND  TREATMENT   OF  THE  MEASURES 

In  this  chapter  the  problems  incident  to  the  scoring 
of  tests  and  the  distribution  of  the  measures  after  the 
tests  are  scored  will  be  presented.  While  the  teacher 
desires  to  know  what  each  individual  pupil  is  able  to  do  on 
a  test  she  also  wants  to  know  how  the  pupils  stand  as  a 
class  and  how  the  class  ranks  when  measured  by  es- 
tablished standards. 

Problems  of  Scoring. — The  problems  of  scoring  are 
many  and  varied.  The  particular  design  of  the  test  is 
very  largely  determined  by  the  way  the  scores  are  to  be 
obtained.  In  Chapter  V  we  noted  the  problems  incident 
to  test  making  in  a  general  way.  In  order  to  bring  out 
more  clearly  the  problems  that  must  be  met  and  solved 
in  designing  a  test  and  especially  to  make  clear  the  in- 
fluence that  scoring  has  on  the  general  design  of  the  test, 
the  actual  problems  which  confronted  the  makers  of  the 
Gregory-Spencer  Geography  Test^  will  be  presented. 

Example  in  the  Development  of  a  Geography  Test. — 
The  first  problem  that  confronted  the  makers  of  this  test 
was  the  question  as  to  whether  the  test  should  be  strictly 
diagnostic,  covering  quite  thoroughly  some  specific  field 
of  geography,  or  a  general  test  covering  in  a  general  way 
the  entire  field?  The  decision  of  this  question  was  made 
only  after  a  great  deal  of  research  work  had  been  done 


^  Designed  by  C.  A.  Gregory,  Professor  of  School  Administration, 
University  of  Oregon,  and  Peter  L.  Spencer,  Instructor  in  the  Uni- 
versity High  School,  University  of  Oregon.  Pubhshed  by  the  Bureau 
of  Educational  Research,  University  of  Oregon. 
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attempting  to  find  out  what  the  best  writers  in  the  field 
of  geography  considered  the  purpose  of  geography  to  be. 
It  was  then  necessary  to  consult  the  standard  texts  in 
geography  in  order  to  learn  what  subject-matter  they  con- 
tained and  how  the  subject-matter  was  divided,  and  to 
make  an  estimate  from  the  amount  of  space  given  to  the 
various  topics  as  to  their  relative  values.  There  is  prob- 
ably no  subject  in  the  curriculum  that  lacks  focus  in  pres- 
entation more  than  the  subject  of  geography.  The  field 
is  so  broad  and  it  contains  so  many  phases  where  emphasis 
might  be  placed,  that  after  one  has  exhausted  every  scien- 
tific means  available,  he  must  still  rely  in  part  on  his 
personal  judgment  as  to  what  constitutes  the  proper  sub- 
ject-matter for  the  test. 

The  question  as  to  whether  the  test  should  be  general 
or  diagnostic  was  arbitrarily  decided  by  making  it,  to  a 
limited  degree,  diagnostic.  Since  the  kind,  amount,  and 
order  of  presentation  of  geographic  material  are  deter- 
mined to  a  very  large  degree  by  the  textbooks  used,  it 
would  be  folly  to  design  a  test  covering  subject-matter  of 
a  different  kind.  In  the  design  of  the  test,  therefore,  the 
subject-matter  actually  being  taught,  rather  than  the  sub- 
ject-matter that  ought  to  be  taught,  furnished  the  material 
for  the  test. 

It  is  not  the  business  of  those  designing  a  test  to  say 
whether  the  things  taught  are  the  things  that  ought  to  be 
taught.  The  purpose  of  the  test  is  to  determine  how  well, 
or  to  what  degree,  the  children  know  and  can  do  the  things 
they  are  being  taught.  There  are  many  lines  of  cleavage 
in  the  subject-matter  of  geography,  and  it  was  necessary 
to  make  divisions  of  some  kind.  It  is  possible  to  divide 
the  field  into  physical,  political,  and  commercial  geography 
and  make  the  tests  with  this  division  in  mind.  Or  we 
might  think  of  the  subject-matter  being  divided  into  causal 
geography,  place  geography,  mathematical  geography,  map 
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study,  etc.  Even  granting  that  the  test  was  to  be  made 
according  to  any  one  of  these  divisions,  since  the  subject- 
matter  in  each  division  is  so  great  that  it  could  not  possibly 
be  covered  by  one,  or  even  two  or  three  tests,  it  was  neces- 
sary to  decide  what  particular  bit  of  subject-matter  should 
supply  the  material  for  the  test. 

State  Examination  Questions  Examined. — It  was  thought 
that  the  questions  prepared  by  state  departments  that  are 
to  serve  as  state  examinations  in  geography  might  give 
some  guidance  in  the  preparation  of  the  test.  A  letter 
was  accordingly  addressed  to  each  state  asking  for  lists 
of  geography  questions  that  the  state  departments  prepare 
for  the  seventh  and  eighth  grades.  Forty-seven  states  re- 
plied and  twenty-three  prepared  such  questions.  Some  of 
the  states  sent  questions  dating  back  five  years.  The  total 
number  of  questions  thus  received  was  about  1,300.  After 
eliminating  the  local  questions  such  as,  "Bound  your  state 
or  county,"  ''Name  the  counties  of  your  state,"  etc.,  the 
questions  were  compiled  according  to  continents,  countries, 
industries,  etc.  One  of  the  striking  characteristics  of  the 
questions  is  the  fact  that  they  lack  focus.  They  are  very 
widely  scattered. 

Another  source  of  information  was  a  representative 
group  of  courses  of  study,  which  were  consulted  for  the 
purpose  of  finding  out  what  phases  of  geography  were 
there  emphasized. 

The  Divisions  Chosen. — Applying  the  information  de- 
rived from  the  above  sources,  supplemented  by  our  per- 
sonal judgments,  it  was  decided  to  divide  the  subject-matter 
for  the  test  into  the  following  divisions:  (1)  Place  and 
fact  geography,  which  test  the  pupil's  knowledge  as  to 
where  important  cities,  rivers,  and  seas  are,  and  also  his 
knowledge  as  to  the  distinguishing  characteristics  of  a  large 
number  of  cities.  (2)  Causal  geography,  which  attempts 
to  discover  the  pupil's  power  to  reason  from  cause  to  effect. 
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This  phase  of  the  test  was  divided  into  two  parts,  one 
pertaining  exclusively  to  the  United  States,  and  the  other 
pertaining  to  the  world  as  a  whole.  (3)  Commercial 
geography.     (4)  Political  geography. 

The  Selection  of  the  Cities  to  Be  Used  in  the  Test. — 
Having  decided  that  place  and  fact  geography  should  con- 
stitute two  of  the  major  parts  of  the  test,  the  next  problem 
was  to  find  out  what  cities,  rivers,  seas,  etc.,  should  be 
located,  and  what  facts  concerning  them  should  be  called 
for  in  the  tests.  In  other  words,  since  all  the  important 
cities  could  not  be  located  in  a  test  of  this  kind,  and  all 
the  facts  concerning  the  cities  thus  located  could  not  be 
presented  in  the  test,  the  problem  of  city  selection  was  one 
that  called  for  some  method  of  choosing  the  most  important 
cities.  This  was  done  in  the  following  way.  In  the  summer 
of  1910  Professor  Whitbeck  conducted  a  geographical 
seminar  in  Cornell  University  and  had  in  his  class  about 
75  teachers,  principals,  and  superintendents  representing 
21  states.  This  class  was  divided  up  into  committees  and 
a  continent  was  assigned  to  each  committee  with  the  in- 
structions that  the  committee  was  to  find  the  most  im- 
portant cities  in  them.  They  were  to  find  cities  that  were 
so  important  that  an  American  school  teacher  should  teach 
their  location  rather  accurately.  They  were  also  to  teach 
why  they  were  important  and  for  what  they  stand  in  world 
affairs.  It  was  agreed  that  a  city  to  be  included  in  the 
list  must  stand  for  more  than  one  important  thing.  Lyons, 
for  example,  though  it  is  the  leading  silk-making  city  of 
the  world,  has  nothing  else  of  importance  that  an  American 
school  boy  needs  to  know.    Hence  it  was  not  included. 

These  committees  decided  upon  the  lists  of  cities  that 
should  be  taught  and  passed  them  over  to  a  committee 
of  the  faculty  on  geography  to  be  passed  upon.  This  com- 
mittee consisted  of  Professor  R.  H.  Whitbeck;  Professor 
Ralph  S.  Tarr,  Cornell ;  Professor  Albert  T.  Brigham,  Col- 
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gate;  Professor  Charles  McMurry;  Philip  Emerson,  cf 
Lynn,  Mass. ;  and  George  D.  Hubbard  of  Ohio  State  Un- 
iversity. 

Two-thirds  of  the  cities  listed  by  the  first  committee 
failed  to  pass  the  faculty  committee.  Any  city  of  tlie 
United  States  that  received  two  or  more  votes  from  the 
faculty  committee  was  retained.  No  foreign  city  was  re- 
tained unless  it  received  at  least  three  of  the  six  votes  of 
the  faculty  committee.  The  cities  selected,  together  with 
the  number  of  votes  each  received  from  the  faculty  com- 
mittee, are  given  below: 


United  States — 25  Cities 


New  York,  6 
Chicago,  6 
Philadelphia,  6 
St.  Louis,  6 
Boston,  6 
Baltimore,  2 
Cleveland,  3 
Buffalo,  3 
Pittsburg,  6 
San  Francisco,  6 
Cincinnati,  2 
New  Orleans,  6 
Milwaukee,  2 


London,  6 
Edinburgh,  6 
Glasgow,  6 
Madrid,  4 
Berlin,  6 
Hamburg,  6 
Vienna,  4 
Rome,  6 


Washington,  6 
Denver,  6 
Louisville,  2 
Minneapolis-St.  Paul,  6 
Kansas  City,  2 
Indianapolis,  2 
Duluth-Superior,  5 
Salt  Lake,  3 
Puget  Sound  Cities,  4 
Scranton-Wilkes-Barre,  3 
Galveston,  4 
Lowell,  3 


Foreign  Countries 
Europe — 16 


Naples,  6 
Athens,  6 
Constantinople,  6 
St,  Petersburg,  6 
Paris,  6 
Marseilles,  3 
Venice,  4 
Liverpool-Manchester,  6 
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Asia — 8 

Bombay,  6  Hong-Kong,  5 

Calcutta,  6  Jerusalem',  6 

Canton,  5  Tokio-Yokohama,  6 

Pekin-Tien-tsin,  6  Meeca-Medina,  3 

Western  Continent  Exclusive  of  United  States 

Montreal,  6  Buenos  Ajves,  6 

Quebec,  5  Havana,  6 

Rio  Janeiro,  6  Mexico,  6 

Africa,  Australia,  and  Islands  of  Sea 

Cairo,  5  Sidney,  3 

Cape  Town,  5  Manila,  6 

Johannesburg,  3  Batavia,  3 

Melbourne,  4  Honolulu,  6 

The  material  having  been  selected  and  the  amount  hav- 
ing been  determined  upon,  the  next  step  was  the  actual 
drafting  of  the  questions  and  statements  that  entered  into 
the  test.  If  the  test  is  to  be  entirely  objective,  the  personal 
equation  of  the  teacher  or  other  individual  grading  it  must 
be  entirely  eliminated.  If,  therefore,  pupils  were  allowed  to 
frame  their  own  answers  to  the  various  parts  of  the  test  it 
would  call  for  an  evaluation  of  these  answers  by  the  one 
scoring  the  papers.  That  is,  it  would  rest  on  the  judgment 
of  the  teacher  whether  the  answer  given  by  pupil  A  was  of 
the  same  value  as  that  given  by  pupil  B.  For  instance,  if 
the  question,  "Why  is  it  dry  east  of  the  Rocky  Mountains?" 
were  asked  there  would  be  a  variety  of  answers  with  vary- 
ing degrees  of  merit,  and  since  each  answer  might  contain 
an  element  of  truth,  the  one  scoring  the  papers  would  be 
called  upon  to  evaluate  these  answers.  In  order  to 
eliminate  this  difficulty  and  also  to  simplify  and  lessen 
the  labor  in  scoring  the  papers,  the  answers  were  framed 
by  those  making  the  test  and  the  pupil  was  called  upon 


220  EDUCATIONAL  MEASUREMENT 

to  put  a  cross  after  the  best  answer  given.  The  following 
statements  taken  from  the  test,  together  with  the  instruc- 
tions for  giving  them  illustrate  the  point. 

Below  are  given  several  facts  about  the  United  States.  Three 
causes  are  suggested  for  each  fact.  Read  the  statements  care- 
fully, then  place  a  cross  (X)  before  the  cause  which  you  think 
best  explains  the  fact. 

1.  The   plains  du'ectly   east   of  the  Rocky  Mountains   are   dry 

because : 

(a)  Few  trees  grow  on  them. 

(b)  The  winds  lose  their  moisture  before  they  get  to  them. 

(c)  The  land  slopes  eastward. 

2.  A  large  number  of  the  people  of  Pennsylvania  are  engaged  in 

the  manufacture  of  iron  and  steel  products,  because: 
(a)   They  have  no  lumber  with  which  to  build. 
{b)  Pennsylvania  has  many  great  iron  mines, 
(c)  Pennsylvania  has  much  coal  with  which  to  smelt  the  iron 
ore. 

3.  Seattle  is  farther  north  than   Chicago,  yet  it  has  a  mUder 

climate  because : 

(a)  Seattle  is  protected  by  the  Rocky  Mountains. 

(b)  Seattle  is  protected  by  heavy  forests. 

(c)  The  ocean  modifies  the  winds  which  blow  over  it. 

4.  Cattle  are  raised  on  the  Great  "Western  Plains  and  are  fattened 

and  prepared  for  market  on  the  prairie  lands  farther  east, 
because : 

(a)  The  market  is  in  the  east  and  the  prairies  produce  much 
grain. 

(b)  It  is  warmer  on  the  prairies  and  they  afford  better  pro- 
tection. 

(c)  The  eastern  people  are  richer  and  can  afford  to  buy 
them. 

5.  New  York  City  is  called  the  Gateway  to  America,  because: 

(o)  It  is  easy  to  get  to  the  interior  through  this  port. 
(&)  It  is  the  largest  city  in  America, 
(c)   It  is  on  a  navigable  river. 
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Advantages  of  Tests  Thus  Designed. — There  are  at  least 
five  advantages  of  tests  designed  like  those  cited  above: 

1.  Time  is  saved  for  the  pupil. — The  ideal  test  in  a 
subject  such  as  geography  is  one  in  which  the  pupil  may 
cover  the  maximum  amount  of  subject-matter  in  a  mini- 
mum amount  of  time,  and  in  which  practically  all  the 
time  may  be  spent  in  determining  the  proper  answer  or 
disposition  of  the  parts  of  the  test  with  a  minimum  amount 
of  time  in  writing  out  or  recording  the  answer  thus  formu- 
lated. In  the  first  part  of  the  test  cited  above,  for  instance, 
it  is  much  easier  simply  to  put  a  cross  before  the  correct 
statement  than  to  express  the  thought  in  the  proper 
language  and  take  the  time  to  write  it  down  on  paper. 

2.  Personal  equation  is  eliminated. — Each  pupil  is  given 
a  fair  and  unbiased  evaluation  of  his  ability  in  the  subject- 
matter  in  question.  If  he  has  the  cross  before  the  correct 
answers  his  score  will  be  as  high  as  that  of  any  other 
pupil  in  the  class.  The  personal  equation  of  the  teacher 
is  entirely  eliminated.  The  pupil  may  look  over  his  paper 
and  know  that  the  score  given  him  is  correct.  The  bonds 
of  friendship  between  him  and  his  teacher  are  thus 
strengthened,  otherwise,  there  is,  many  times,  a  strained 
relation  between  pupil  and  teacher  when  the  pupil  thinks 
the  teacher  has  not  properly  evaluated  his  paper. 

3.  Much  time  is  saved  in  scoring. — If  pupils  were 
allowed  to  formulate  their  own  answers,  each  word  in  the 
test  would  have  to  be  read  by  the  one  scoring  the  papers, 
thus  increasing  the  amount  of  labor  many  times.  By  the 
new  method  and  with  the  aid  of  a  key  in  the  hands  of 
one  scoring  the  papers,  the  labor  is  reduced  to  the  mini- 
mum and  much  of  the  drudgery  is  eliminated. 

4.  More  ground  may  he  covered  hy  a  test  thus  designed. 
— A  pupil  may  cover  four  or  five  times  as  much  ground 
by  a  test  thus  designed  in  the  time  allotted,  and  it  may 
be  covered  more  thoroughly.    He  is  less  fatigued,  because 
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the  labor  of  writing  is  eliminated,  and  he  does  not  dislike 
a  test  of  this  kind  because  most  of  the  drudgery  has  been 
removed  and  he  knows  he  will  get  an  unbiased  evaluation 
of  his  work  when  it  is  completed. 

5.  Pupils  must  review  the  whole  field  in  preparing  for 
the  examination. — Since  pupils  know  that  the  entire  field 
cannot  be  covered  in  an  examination  as  now  given,  they 
spend  considerable  time  trying  to  determine  what  ques- 
tions will  probably  be  asked,  and  spend  their  time  on  them 
instead  of  reviewing  the  entire  field.  The  old  alibi  that  in 
reviewing  for  the  examination  the  pupil  studied  every  part 
of  the  work  but  that  covered  by  the  examination  would 
no  longer  apply,  since  the  whole  field  would  be  covered. 

Determination  of  the  Scores. — It  is  extremely  difficult 
for  some  teachers  to  believe  that  such  a  test  as  the  one 
suggested  above  does  anything  more  than  give  the  highest 
score  to  the  luckiest  guesser.  They  say  it  is  a  game  of 
chance  that  cannot  possibly  give  a  correct  estimate  of  the 
pupil's  school  achievements.  In  spite  of  this  prejudice, 
chance  is  fatally  exact,  and  it  is  on  this  principle  that  the 
scoring  may  be  done  with  an  assurance  that  the  real  knowl- 
edge of  the  pupil  may  be  determined  in  a  test  of  this  kind 
or,  at  least,  as  accurately  as  by  the  old  method. 

If  one  were  to  take  a  hundred  pennies  and  toss  them 
into  the  air  one  hundred  times  and  count  the  number  that 
fell  heads  up  and  the  number  that  fell  tails  up,  he  would 
find  that  in  the  10,000  tossed  almost  exactly  half  of  them 
would  fall  heads  up. 

If,  instead  of  there  being  three  statements  to  choose 
from  in  giving  the  correct  reasons  in  the  above  test,  there 
had  been  only  two,  the  chance  would  be  analogous  to  toss- 
ing pennies.  A  student  who  knew  absolutely  nothing  about 
the  statements  would  get  them  right  approximately  50  per 
cent  of  tlic  time  by  wiere  chance.  Then  the  question  arises 
as  to  how  the  papers  should  be  scored  to  take  proper 
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cognizance  of  this  game  of  chance.  To  do  this  we  shall 
take  a  number  of  hypothetical  cases  to  show  that  it  is 
possible  to  make  the  proper  evaluation  in  each  case. 

1,  When  there  is  a  choice  between  two  answers. — Let  us 
suppose  there  are  20  parts  to  the  test  and  the  pupil  actually 
knew  10  of  them.  Then  he  would  mark  his  paper  as 
follows:  He  would  put  a  cross  before  the  10  he  actually 
knew  and  guess  at  the  other  10.  By  the  laws  of  chance 
he  would  get  50  per  cent  of  them  right.  Therefore,  his 
paper  would  have  15  of  the  20  parts  of  the  test  marked 
correctly.  In  order  to  make  proper  allowance  for  his 
chance  scores  we  should  subtract  the  number  he  marked 
incorrectly  (5)  from  the  number  he  marked  right  (15)  in 
order  to  get  his  actual  score,  10,  because  from  our  hypothe- 
sis we  know  that  the  num.ber  marked  incorrectly  would  be 
half  the  number  he  guessed  at.  Therefore,  his  actual  score 
would  be  10,  which,  according  to  our  hypothesis,  is  what 
he  actually  knew  about  the  test. 

Suppose  again  a  pupil  actually  knows  16  of  the  20  parts 
of  a  test.  How  would  the  scores  show  this  fact  ?  Since  he 
actually  knows  16  of  the  20  parts  of  the  test  he  would  put 
the  proper  check  marks  before  these  16  parts  and  guess  at 
the  other  4  parts.  He  would  get  2  of  these  four  parts 
right  by  the  laws  of  chance.  Therefore,  his  paper  would 
have  18  answers  marked  correctly.  Subtracting  the  num- 
ber he  had  wrong  (2)  from  the  number  he  answered  cor- 
rectly (18),  giving  him  a  final  score  of  16,  which  is  accord- 
ing to  hypothesis. 

2.  Where  there  is  a  choice  among  more  than  two  answers. 
— In  the  geography  questions  cited  above,  the  student  has 
a  choice  of  one  of  three  answers.  By  the  laws  of  chance 
he  would,  therefore,  get  33>^  per  cent  of  them  right  even 
if  he  knew  absolutely  nothing  about  them.  How  would 
the  papers  be  scored  in  a  case  of  this  kind?  Let  us  take 
the  case  cited  above  and  suppose  there  are  20  parts  to 
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the  test  and  that  the  student  actually  knows  11  of  them. 
He  would,  therefore,  put  the  proper  check  mark  before 
the  11  that  he  knew  and  guess  at  the  other  nhie,  three  of 
which  he  would  get  right  by  chance.  His  paper  would, 
therefore,  show  14  parts  marked  correctly.  Now  since  he 
would  get  but  one  in  three  right  by  chance  of  the  num- 
bers he  guessed  at,  we  know  that  the  number  marked  in- 
correctly would  be  twice  as  large  as  the  number  he 
marked  correctly.  Therefore,  we  would  subtract  one-half 
the  number  he  marked  incorrectly  (3)  from  the  total  num- 
ber marked  correctly  (14),  thus  leaving  his  final  score  11, 
which  is  according  to  hypothesis. 

If  the  number  of  answers  to  choose  from  was  four  in- 
stead of  three  and  a  student  actually  knew,  let  us  say,  16 
of  the  parts,  his  final  score  would  be  determined  as  follows : 
He  would  guess  at  four  of  them  and  get  25  per  cent,  or 
one,  of  them,  right  by  chance.  Therefore,  his  paper  would 
have  17  parts  marked  correctly.  Since  he  marked  only 
one  in  four  right  by  chance  the  number  marked  incorrectly 
would  be  three  times  as  large  as  the  number  marked  right 
by  chance.  Therefore,  one-third  of  the  number  marked 
wrong,  or  one,  would  be  the  number  to  be  subtracted  from 
his  total  score  in  order  to  get  his  actual  score. 

Some  Objections  to  Tests  of  This  Kind. — It  may  be 
argued  that  as  much  information  cannot  be  obtained  by 
this  method  as  the  old  method  because  a  child  told  to 
discuss  a  certain  topic  would  present  knowledge  that  would 
take  a  dozen  or  more  questions  to  bring  out  by  the  new 
method.  There  seems  to  be  much  truth  in  this  criticism. 
It  may  be,  however,  that  enough  additional  questions  may 
be  asked  by  the  new  method  to  offset  the  apparent  loss  in 
giving  up  the  old  system. 

It  is  also  argued  that  an  examination  of  this  kind  is 
more  or  less  superficial  and  does  not  involve,  to  any  great 
extent,  the  higher  thought  processes  such    as    reasoning, 
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imagination,  etc.  It  would  seem,  however,  that  these  proc- 
esses must  go  on  the  same  as  by  the  other  method.  The 
only  difference  is  that  the  new  method  does  not  impose 
the  mechanical  operation  of  actually  writing  out  the 
answers.  Some  claim  that  the  choice  of  words  and  the 
drill  in  sentence  formation  aids  thought  and  that  this  is 
all  lost  when  the  answers  are  ready  made.  They  claim  that 
abstractions,  comparisons,  and  reasoning  do  not  take  place 
to  the  same  degree  in  the  new  method  as  in  the  old.  It 
seems  to  the  writer  that  these  criticisms  are  not  well 
founded. 

Effect  of  Incorrect  Statements  Being  Placed  before  the 
Student. — Another  criticism  is  that  because  incorrect 
statements  are  placed  before  the  children  the  psycho- 
logical influence  is  bad,  the  false  statements  being  taken 
as  truths.  In  answer  to  this  criticism  it  may  be  said  that 
the  facts  do  not  warrant  the  statement  that  this  condition 
prevails  to  an  appreciable  degree.  Moreover  the  child 
would  bring  into  consciousness  the  right  and  wrong 
answers  in  forming  his  judgments  by  the  old  method  of 
giving  examinations. 

It  is  true  that  this  method  does  not  show  where  the 
reasoning  goes  wrong  or  ceases  altogether ;  but  it  does  save 
students  the  agony  and  perspiration  necessary  to  perpe- 
trate an  answer  like  the  following  cited  by  McCall.^  A 
student  was  asked  the  following  question  in  a  recent  course 
in  educational  measurements:  "Which  three  of  the  tests 
described  by  Whipple  do  you  think  would  be  of  most 
service  in  an  elementary  school,  if  your  school  had  a 
psychologist  to  apply  them?"     The  answer  was: 

The  tests  described  by  Whipple  embrace  most  of  the  difficulties 
that  would  be  embraced  in  problems  of  classroom  instruction.    I 

*"A  New  Kind  of  Scnool  Examination,"  JoumoZ  0/  Educational  Re- 
search.   Vol.  2,  pp.  33-46. 
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think  his  tests  embrace  a  great  variety  of  methods  of  approach 
and  it  seems  difficult  for  me  to  think  of  just  three  to  whom  the 
presence  of  a  psychologist  in  a  school  would  give  help.  I  would 
think  it  would  be  tests  in  which  knowledge  of  the  workings  of 
a  child's  mind  and  its  growth  and  development  would  be  most 
apparent  since  those  not  particularly  trained  might  focus  on 
others  not  of  this  kind.  I  feel  it  would  be  unwise  to  specifically 
mention  just  three  when  the  number  is  so  great  which  would 
fulfill  all  these  requirements.  Every  teacher  to  be  a  psychologist 
would  help  all  classroom  measurement  work  of  whatever  kind 
greatly,  I  know  since  we  cannot  know  of  the  influence  of  a  test 
upon  which  any  group  except  by  the  niental  reaction  produced. 

In  further  support  of  the  true-false  examination  it  is 
maintained  that  it  will  promote  a  better  feeling  between 
the  teachers  and  pupils;  pupils  will  no  longer  strive  to 
tell  what  they  do  not  know ;  it  eliminates  the  personal  equa- 
tion and  makes  a  more  pleasant  atmosphere  in  general. 

The  Values  to  Be  Assigned  to  the  Scores. — In  the  fore- 
going discussion  of  scoring,  the  test  questions  have  been 
difficulty  questions,  the  problem  being,  ''How  hard  a  ques- 
tion can  the  pupil  answer?"  The  answers  were  either 
right  or  wrong  and  each  part  was  given  an  arbitrary  value 
of  1.  It  is  evident  that  some  of  these  questions  are  more 
difficult  than  others.  The  question,  therefore,  arises  as  to 
whether  or  not  the  scores  should  be  weighted  so  that  credit 
may  be  given  according  to  the  relative  difficulty  of  the 
various  parts  of  the  test.  The  defense  of  this  method  is 
that  there  is  such  a  high  correlation  between  the  scores 
when  the  questions  are  weighted  and  when  they  arc  given 
an  arbitrary  value  of  1  that  the  additional  accuracy  is  not 
worth  what  it  costs  to  get  it. 

Charters  found  a  similar  condition  in  making  his  lan- 
guage tests  and  did  not  weight  his  scores  for  that  reason. 

Some  experimentation  was  done  in  the  School  of  Edu- 
cation, University  of  Oregon,  upon  the  correlation  between 
the  scores  made  where  each  part  of  the  test  is  weighted 
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according  to  its  difficulty  and  where  the  parts  were  given 
an  arbitrary  value  of  1.  The  data  were  taken  from  the 
Douglass  Standard  Diagnostic  Tests  for  Elementary 
Algebra,  the  Monroe  Standardized  Silent  Reading  Tests, 
and  the  Kansas  Silent  Reading  Tests  devised  by  Dr.  F.  J. 
Kelly,  From  random  samples  taKen  from  papers  in  each 
of  these  tests,  the  correlation  between  the  weighted  scores 
and  the  scores  where  each  part  was  given  an  arbitrary 
value  of  1  was,  in  each  case,  taken  as  0.9  or  above. 

General  Problem  of  Weighting  Scores. — In  test  making, 
however,  if  one  wishes  to  weight  the  scores  it  may  be  done 
by  determining  the  relative  difficulty  of  each  part  of  the 
test.  This  is  usually  done  on  the  principle  that  the  larger 
the  number  of  children  missing  a  part,  the  more  difficult 
that  part  is  and  the  larger  score  it  should  have.  The 
following  illustration  from  spelling  will  make  the  point 
clear.  In  giving  the  ordinary  spelling  examinations  or 
tests  the  usual  method  of  scoring  is  to  mark  the  papers 
on  a  basis  of  100  per  cent.  If  there  were  20  words,  each 
word  spelled  correctly  is  given  an  arbitrary  value  of  5.  It 
might  so  happen  that  two  of  the  words  in  the  test  were 
''home"  and  ''separate."  The  grading  of  the  papers 
might  show  that  98  per  cent  of  the  fifth  grade  in  a  certain 
city  system  were  able  to  spell  the  word  "home"  correctly 
whereas  only  40  per  cent  were  able  to  spell  the  word 
"separate";  yet,  by  the  old  method  of  marking  the  papers, 
a  child  would  be  given  5  per  cent  towards  his  final  grade 
if  he  spelled  the  word  "home"  correctly  and  the  same 
amount  if  he  spelled  the  word  "separate"  correctly.  It 
is  evident  that  since  the  spelling  difficulty  of  the  latter  is 
greater  than  that  of  the  former,  it  should  receive  a  higher 
score.  If  this  is  done  it  is  called  weighting  the  word  on 
a  basis  of  its  spelling  difficulty,  which  is  probably  the  most 
scientific  method  of  weighting  the  various  questions  or 
parts  of  tests  in  the  tool  subjects.     The  question  as  to 
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how  much  weight  to  give  to  a  particular  part  of  a  test 
is  generally  determined  in  one  of  two  ways. 

1.  By  the  teacher's  judgment. — This  is  the  usual  method 
followed  by  teachers  in  giving  the  ordinary  examination. 
If  the  scoring  is  to  be  done  by  the  per  cent  method  the 
teacher  may  arbitrarily  say  that  question  number  1  is 
worth  8  per  cent  and  question  number  2  is  worth  12  per 
cent,  and  so  on.  She  may  weight  the  questions  in  this 
way  on  a  basis  of  difficulty,  her  estimate  being  that  ques- 
tion 2  is  one-and-a-half  times  as  difficult  as  question  1,  or 
she  may  assign  these  weights,  not  because  question  2  ia 
more  difficult  than  question  1,  but  because  she  thinks  it 
is  of  more  importance  for  social  or  other  reasons  that  the 
children  should  know  question  2.  This  method  is  rarely 
followed  in  test  making. 

2.  By  weighting  the  parts  according  to  the  distribution 
of  abilities  as  shown  by  the  normal  frequency  curve. — In 
this  case  the  per  cent  of  pupils  missing  each  part  is  deter- 
mined and  these  per  cent  values  are  transmuted  into  units 
of  standard  deviation  (sigma)  or  probable  error  (P.E.). 
There  are  many  methods  that  may  be  followed  in  doing 
this.  One  of  the  most  common  is  perhaps  to  assign  an 
arbitrary  value  of  1  to  the  easiest  word  or  question  in  the 
test.  This  is  determined  as  indicated  above  by  finding  the 
number  of  pupils  who  are  able  to  answer  it  as  compared 
with  the  rest  of  the  questions  in  the  test.  The  procedure 
may  be  illustrated  as  follows:  Suppose  the  easiest  word 
in  a  spelling  test  is  spelled  by  90  per  cent  of  the  pupils 
and  the  next  word  in  order  of  difficulty  is  spelled  by  80 
per  cent  of  the  pupils.  AVhat  should  be  the  weighting 
assigned  to  these  two  words?  If  we  desire  to  measure  all 
the  words  in  terms  of  the  easiest  word  we  may  assign  an 
arbitrary  value  of  1  to  the  first  word. 

Tables  have  been  prepared  so  that  it  is  possible  to  con- 
vert percentile  scores  into  units  of  standard  deviation  and 
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weight  them  according  to  a  normal  distribution.  For  in- 
stance, in  the  above  example  a  word  missed  by  10  per  cent 
of  the  pupils  is  given  a  value  of  1.73  (approximately),  and 
one  missed  by  20  per  cent  is  given  a  value  of  2.16  (approxi- 
mately). Therefore,  the  relative  weights  assigned  to  the 
two  words  are  to  each  other  as  1.73  is  to  2.16.  If  we  give 
the  first  word  an  arbitrary  value  of  1,  then  the  weight  or 
value  of  the  second  word  would  be  1.25.^  There  is  no  par- 
ticular reason  why  the  easiest  word  or  question  should  be 
given  an  arbitrary  value  of  1  other  than  the  fact  that  some 
fractions  may  be  avoided  by  this  procedure. 

Accumulation  Scores  and  Scores  of  Greatest  Difficulty. 
— There  are  two  general  ways  of  determining  the  final 
scores  of  pupils.  One  method  is  to  give  each  question  or 
part  a  weighted  value  and  let  the  sum  of  the  scores  of  the 
various  parts  constitute  the  final  score  of  the  pupil.  This 
is  the  method  used  in  the  Kansas  Silent  Reading  Tests 
devised  by  Dr.  Kelly,  the  Monroe  Reading  Tests,  the 
Douglass  Algebra  Tests,  and  many  others.  The  final  score 
given  a  pupil  is  the  accumulated  values  or  weights  given 
to  each  part.  This  procedure  is  followed  also  where  the 
parts  are  not  weighted  but  each  part  is  given  an  arbitrary 
value  of  1  as  in  the  Courtis  arithmetic  tests.  Here  each 
problem  is  given  a  value  of  1  and  a  pupil's  score  is  the 
sum  of  the  problems  solved  correctly. 

The  other  method  of  determining  the  final  score  of  a 
pupil  is  determined  by  the  weighted  value  of  the  most 
difficult  problem  a  pupil  can  solve.  The  Woody  Arithmetic 
Tests  illustrate  this  method  of  scoring.  The  final  score 
given  to  a  pupil  is  not  determined  by  finding  the  sum  of 
the  weighted  values  of  all  the  problems,  but  is  simply  the 
weighted  value  of  the  most  difficult  problem  solved. 


3  See   Harold   O.   Rugg,    Statistical   Method   Applied   to   Edvmtion, 
pp.  392-395. 
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The  same  principle  is  followed  in  the  Thorndike  Hand- 
writing Scale.  The  score  of  the  pupil  is  the  highest  quality 
reached  and  not  the  accumulation  of  the  quality  values 
ranging  from  the  lowest  to  the  highest  quality  reached  by 
the  examinee. 
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CHAPTER  VIII 

THE     MEASUREMENTS     OF     EDUCATIONAL     PROCESSES     AND 
PRODUCTS  IN  FIVE  FIELDS  OF  SCHOOL  WORK 

In  the  latter  part  of  Chapter  II  the  entire  field  of  tests 
and  measurements  was  arbitrarily  divided  into  seven  divi- 
sions. The  amount  of  work  done  in  each  division  seemed 
to  warrant  such  an  arbitrary  classification.  It  should  not 
be  inferred  that  the  divisions  made  are  the  only  divisions 
into  which  the  field  of  measurements  could  be  divided.  It 
does  seem,  however,  that  such  a  classification  is  quite  ex- 
clusive and  there  is  comparatively  little  overlapping  in 
these  fields.  Enough  work  has  been  done  in  each  of  them 
to  give  a  great  mass  of  data  which  are  beginning  to  throw 
a  great  deal  of  light  on  the  processes  and  products  of 
education.  In  the  last  five  chapters  we  have  discussed 
and  criticized  the  measurements  of  intelligence  and  the 
measurements  of  school  achievements.  In  this  chapter  we 
shall  discuss,  very  briefly,  the  other  five  fields,  not  with 
an  idea  of  treating  any  one  of  them  exhaustively  but  simply 
to  call  attention  to  the  work  that  is  being  done  along  the 
lines  mentioned  in  Chapter  II.  The  fields  are:  (1)  the 
measurements  of  the  materials  of  instruction ;  (2)  the  meas- 
urements of  the  physical  growth  of  school  children;  (3) 
the  measurements  of  the  money  cost  of  education;  (4)  the 
measurements  of  school  buildings;  and  (5)  the  measure- 
ments of  retardation,  acceleration,  and  elimination. 

With  the  possible  exception  of  the  fourth  category,  suffi- 
cient facts  and  data  are  extant  to  fonn  the  basis  of  a 
lai^e  volume,  and,  in  some  cases,  many  volumes,  on  each  of 
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these  fields  of  measurement.  It  was  also  pointed  out  in 
Chapter  II  that  each  of  these  fields  may  be  subdivided  into 
smaller  divisions  and  that  some  are  combined  to  form  new 
fields.  Tests  in  vocational  guidance,  for  instance,  may  in- 
volve measures  of  intelligence,  measures  of  school  achieve- 
ments, and  physical  measurements.  Other  measures  are 
similarly  combined  to  form  new  and  definite  fields.  With- 
out further  discussion  of  the  broad  general  field  of  meas- 
urements we  shall  deal  more  specifically  with  the  various 
divisions  as  outlined. 

I.  Measurements  op  the  Materials  of  Instruction 

It  is  only  within  the  last  decade  that  any  considerable 
work  has  been  done  in  measuring  the  materials  of  instruc- 
tion. To  some  it  has  seemed  like  educational  pedantry 
to  count  the  words  in  a  spelling  book,  or  to  score  a  text- 
book of  any  kind  in  order  to  get  an  analytic  conception 
of  its  contents.  On  the  other  hand,  the  folly  of  presenting 
material  year  after  year  with  little  knowledge  of  its  con- 
tent and  with  no  quantitative  conception  as  to  the  relative 
amounts  of  the  various  elements  that  compose  it  has  led 
many  students  to  an  intensive  study  of  the  materials  of 
instruction. 

Determination  of  a  Spelling  Vocabulary. — It  was  by 
studies  of  this  kind  that  a  spelling  vocabulary  of  school 
children  and  also  of  adults  was  determined.  Each  indi- 
vidual of  school  age,  or,  at  least,  after  he  has  passed  the 
first  two  or  three  years  of  his  school  life,  has  four  vocabu- 
laries: (1)  reading,  (2)  speaking,  (3)  hearing,  and  (4) 
spelling,  or  writing.  Spelling  being  a  tool,  there  would 
be  no  need  for  learning  to  spell  words  unless  an  individual 
would  use  these  words  when  he  writes.  The  problem  then 
is  to  discover  what  words  an  individual  uses  when  he 
writes.    This  can  be  done  only  by  taking  the  written  com- 
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positions  of  those  who  write  and  analyzing  them  in  order 
to  determine  the  words  used  and  their  frequency.  These 
compositions  may  include  business  letters,  friendship  let- 
ters, compositions  written  in  school,  the  composition  of  a 
daily  newspaper,  and  many  other  types  of  material. 

Since  it  is  not  possible  to  teach  all  the  words  that  one 
may  use  when  he  writes,  the  best  that  can  be  done  is  to 
determine  the  words  that  occur  in  greatest  frequency  and 
teach  them  as  a  minimum  word  list.  Many  perplexing 
problems  present  themselves  in  determining  the  proper 
word  lists,  such  as:  How  may  one  find  out  what  words 
should  be  taught?  Ayres  ^  made  up  his  list  of  1,000  words 
by  combining  the  results  of  four  studies  in  spelling.  One 
study  was  made  by  Rev.  J.  Knowles  of  London,  England, 
and  was  published  in  pamphlet  form  under  the  title,  * '  The 
London  Point  System  of  Reading  for  the  Blind"  (1904). 
In  making  this  study  the  author  took  passages  from  the 
Bible  and  other  literature,  containing  in  all  100,000  words, 
and  from  this  list  took  the  353  words  of  the  greatest  fre- 
quency. 

The  second  study  was  made  by  R.  C.  Eldridge  of  Niagara 
Falls.  Eldridge  made  an  analysis  of  250  different  articles 
taken  from  four  issues  of  four  Sunday  newspapers  pub- 
lished in  the  city  of  Buffalo.  He  found  that  they  con- 
tained a  total  vocabulary  of  6,002  different  words  and  43,- 
989  running  words. 

The  third  study  was  made  by  Ajtcs  and  published  by 
the  Russell  Sage  Foundation  in  a  monograph  entitled,  27ie 
Spelling  Yocabularies  of  Personal  and  Business  Letters, 
This  study  consisted  of  a  tabulation  of  23,629  words  from 
2,000  short  letters  written  by  2,000  people.  The  total 
vocabulary  used  was  found  to  consist  of  2,001  different 


'  A  Measuring  Scale  for  Ability  in  Spelling  (Division  of  Education, 
Russell  Sage  Foundation,  1916). 
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words.  The  number  of  appearances  of  each  was  reported 
in  the  monograph. 

The  fourth  study  was  made  by  Cook  and  O'Shea  and 
the  results  presented  in  a  book  entitled  The  Child  and  His 
Spelling,  published  in  1914.  This  study  consists  of  a  tabu- 
lation of  approximately  200,000  words  taken  from  the 
family  correspondence  of  13  adults.  The  total  vocabulary 
was  found  to  be  5,200  different  words. 

The  list  of  1,000  commonest  words  in  the  Ay  res  Scale 
was  finally  selected  from  these  four  studies  by  finding 
the  frequency  with  which  each  word  appeared  in  the  four 
studies,  weighting  that  frequency  according  to  the  size 
of  the  base  of  which  it  was  a  part,  adding  the  four  fre- 
quencies thus  obtained,  and  finding  their  average. 

Anderson  ^  attempted  to  determine  the  spelling  vocabu- 
lary by  analyzing  the  words  found  in  5,000  letters  gathered 
by  school  children  from  the  various  sections  of  the  state 
of  Iowa  and  incorporating  the  words  of  greatest  frequency 
into  a  minimum  spelling  list.  His  list  contains  approxi- 
mately 5,000  words. 

Another  problem  that  must  be  solved  is  the  question  as 
to  how  great  a  frequency  a  word  must  have  for  each 
100,000  running  words  before  it  is  incorporated  in  the 
minimum  spelling  list.  This  must  be  decided  arbitrarily. 
For  instance,  suppose  a  word  occurs  but  once  in  100,000 
running  words.  Is  that  frequency  sufficient  to  justify  its 
being  taught  in  the  elementary  school?  Or,  should  the 
writer  be  referred  to  the  dictionary  to  find  how  to  spell 
a  word  whose  frequency  is  but  1  in  100,000  running  words  ? 
It  is  probably  safe  to  say  that  a  word  with  a  frequency  of 
less  than  3  in  100,000  running  words  should  not  occur  in 
a  speller  for  the  elementary  school. 


-  The  Determination  of  a  Spelling  Vocabulary/  Based  upon  Written 
Correspondence  (Ph.D.  thesis,  University  of  Iowa,  1917). 
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Having  determined  the  list  of  words  that  should  be 
taught  in  the  elementary  school,  the  next  problem  is  to 
determine  the  order  in  which  they  should  be  taught.  This 
again  involves  measurements  in  order  to  properly  grade 
the  words.  Several  factors  enter  into  this  problem.  Words 
might  be  graded  on  a  basis  of  their  spelling  difficulty,  the 
easiest  words  being  assigned  to  the  lower  grades,  and  the 
more  difficult  ones  to  the  upper  grades.  Or  again  they 
might  be  graded  on  the  basis  of  use.  Children  in  the 
lower  grades  do  not  use  quite  the  same  vocabulary  as 
children  in  the  upper  grades.  Jones  ^  found  that  when 
children,  from  grades  2  to  8  inclusive,  were  asked  to  write 
compositions,  the  average  vocabularies  from  grade  to  grade 
were  as  follows: 

Grade  Number  of  Word?. 

2     521 

3     908 

4  1,235 

5  1,489 

6  1,710 

7  1,926 

8  2,135 

If  words  were  selected  exclusively  on  a  basis  of  use,  the 
words  used  by  children  in  the  second  grade  should  con- 
stitute the  spelling  vocabularj^  for  that  grade  and  addi- 
tional words  used  by  children  in  the  third  grade,  together 
with  those  left  over  from  the  second  grade,  would  con- 
stitute the  words  for  the  third  grade,  and  so  on.  Use, 
frequency,  and  difficulty  are  the  chief  factors  which  must 
determine  the  grading  of  the  words  in  the  subject  of  spell- 
ing. Extensive  measurements  have  been  made  in  all  three 
factors  and  the  grade  in  which  a  word  is  now  presented 

^  Concrete  Investigation  of  the  Material  of  English  Spelling  (University 
of  South  Dakota),  p.  23. 
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is  no  longer  a  mere  matter  of  opinion  but  is  the  result  of 
scientific  investigation. 

A  Study  of  the  Reading  and  Spelling  Vocabularies  of 
Books  Used  in  the  First  Three  Grades. — Perhaps  the  most 
thorough  and  complete  study  of  the  words  that  children 
of  the  first  three  grades  are  caUed  upon  to  read  and  spell 
has  been  made  under  the  direction  of  the  author  by  Miss 
Ruth  Chase. 

The  problem  originated  in  the  following  way :  The  Text- 
book Commission  of  the  state  of  Oregon  met  and  adopted 
a  list  of  books  to  be  used  by  children  in  the  first  three 
grades  of  the  elementary  school  for  a  period  of  six  years. 
The  books,  with  the  number  of  pages  in  each,  are  given 
below : 

1.  Beacon  Primer,  124  pages 

2.  Natural  Prmier,  122  pages 

3.  Beacon  First  Reader,  130  pages 

4.  Natural  First  Reader,  136  pages 

5.  Natural  Second  Reader,  256  pages 

6.  Natural  Third  Reader,  304  pages 

7.  Hamilton's  Essentials  of  Arithmetic,  124  pages 

8.  New  World  Speller,  Reading  Vocabulary 

9.  New  World  Speller,  Spelling  Vocabulary,  124  pages* 

The  Problem. — The  study  was  made  to  determine:  (1) 
the  number  of  different  words  a  child  would  be  called 
upon  to  read,  speU,  and  understand,  to  meet  the  minimum 
requirements  of  the  state  course  of  study  for  the  state 
of  Oregon;  (2)  how  rapidly  a  child's  reading  and  spell- 
ing vocabularies  grow,  assuming  that  the  work  is  taken 
up  in  the  order  indicated  by  the  state  course  of  study;  (3) 
the  frequency  of  the  words  used;  (4)  the  correlations  be- 
tween the  books  used,  that  is,  the  number  of  words  common 

*  The  New  World  Speller  was  divided  into  two  parts  for  purposes  of 
scoring.  The  eighth  book  mentioned  above  deals  with  the  reading 
material  a  child  must  master  in  this  book,  and  the  ninth  book  deals 
exclusively  with  the  words  he  must  learn  to  spell. 
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to  all  the  books;  (5)  the  number  of  running  words  in  the 
series. 

Terms  Used  in  the  Study. — Different  words  means  the 
number  of  separate  words  used  in  each  book.  If  the  proper 
name  "Mary  Jones"  were  used,  it  is  listed  as  two  words. 
The  plural  and  singular  of  a  word  were  counted  as  two 
words. 

There  were  503  different  words  used  in  the  Natural 
Primer,  although  265  of  these  were  not  counted  as  new 
words  because  they  had  been  used  in  the  Beaton  Primer. 

New  words.  A  word  used  for  the  first  time  is  called 
a  new  word.  In  the  Beacon  Primer  each  word  was  counted 
as  a  new  word  because  this  book  was  the  first  one  used 
in  the  series.  The  sum  of  the  new  words  used  in  each 
book  is  the  total  number  of  different  words  used  in  all  the 
books. 

Used  words  are  words  which  have  occurred  in  an  earlier 
book  of  the  series.  The  Natural  Second  Reader  contains 
1,770  different  words,  of  which  861  are  neiv  and  909  are 
used  words. 

Running  words  means  the  number  of  times  all  the  words 
occur.  For  example,  if  12  different  words  are  found  and 
the  sum  of  their  frequencies  is  49,  the  number  of  running 
words  is  49. 

The  Results. — Table  III  shows  the  distribution  of  words 
in  each  book.  The  numbers  at  the  heads  of  the  columns 
refer  to  the  readers  named  above: 

Table  III. — Distribution  and  Size  of  Vocabularies 


of     running 
of     different 


Number 

words 
Number 

words 
Number  of  used  words 
Number  of  new  words. 
Increase  in  vocabulary 

from  book  to  book. . 


1 

2 

3 

4 

5 

6 

7 

8 

7,007 

0,315 

8,455 

8.878 

25,332 

34.865 

12,574 

2,705 

743 

0 

743 

503 
255 
238 

804 
404 
395 

849 
584 
265 

1.770 
909 
861 

3,304 
1,416 

1,888 

1,282 
753 
529 

450 

392 

58 

743 

981 

1,376 

1,641 

2,502 

4,390 

4,919 

4,977 

6.767 

1.453 

1.240 

213 

5.190 
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The  study  reveals  the  following  facts: 

(a)  4,977  different  words  constitute  the  child's  minimum  read- 
ing vocabulary,  to  which  are  added  213  words  he  must  learn  to 
spell  which  are  not  found  in  his  reading  vocabulary,  making  a 
total  of  5,190  different  words. 

(b)  106,121  is  the  number  of  running  words. 

(c)  289  words,  or  5.8  per  cent  of  the  child's  reading  vocabulary, 
are  used  75,591  times'  and  constitute  71.2  per  cent  of  the  running 
words  in  the  reading  vocabulaiy, 

(d)  13  words  occur  27,458  times. 

(e)  10  words  occur  24,520  times. 

(/)   1,470  words,  or  29.9  per  cent  of  the  words,  occur  but  once. 

(g)  1,453  words  were  listed  in  the  speller  6,767  times,  or  an 
average  of  4.6  times. 

(h)   3,218  words  occur  29,060  times,  an  average  of  9  times. 

(i)  289  words  occur  75,591  times,  an  average  of  261  times  each. 

( j)  213  words  were  found  in  the  speller  which  were  not  found 
in  the  readers. 

Table   IV. — Thirteen  Words  of  Greatest  Frequency  Found  m 
THE  Readers  Compared  with  the  Ayres  *  List 


Rank  in 

Frequency  in 

Rank  in 

Readers 

Readers 

Ayres'  List 

1.  the 

7,927 

1.  the 

2.  and 

3,142 

2.  and 

3.  to 

2,441 

3.  of 

4.  a 

2,336 

4.  to 

5.  he 

1,790 

5.  I 

6.  of 

1,714 

6.  a 

7.  I 

1,629 

7.  ui 

8.  in 

1,473 

8.  that 

9.  you 

1,385 

9.  you 

10.  was 

1,343 

10.  for 

11.  said 

1,069 

11.  it 

12.  it 

942 

12.  was 

13.  is 

927 

13.  is 

27,458 

*  A  Measuring  Scale  for  Ability  in  Spelling  (Division  of  Education, 
Russell  Sage  Foundation,  1915) . 
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It  is  interesting  to  note  that  the  13  most  frequent  words 
found  in  this  study  are  identical  with  those  of  the  Ayres 
list  except  one.  This  list  contains  the  word  ' '  said, ' '  which 
is  not  found  in  the  Ayres  list;  and  the  word  "that"  is 
found  in  the  Ayres  list  but  not  in  the  thirteen  most  fre- 
quent words  in  this  list. 

The  Contents  of  Three  American  Histories. — Another 
study  made  under  the  direction  of  the  author  that  illus- 
trates measurements  of  the  materials  of  instruction  is  the 
scoring  of  three  textbooks  in  American  history.  In  June, 
1919,  the  Textbook  Commission  for  the  state  of  Oregon 
readopted  Mace's  School  History  of  the  United  States  for 
a  period  of  six  years.  The  book  had  been  in  use  for  a 
number  of  years  in  Oregon  and  was  generally  conceded 
to  be  a  satisfactory  text  in  United  States  history.  From  the 
fact  that  it  was  to  be  the  official  text  in  American  history 
for  the  next  six  years  it  was  thought  that  its  usefulness 
might  be  increased  if  its  contents  were  scored  in  reference 
to  some  of  the  most  salient  features  that  are  being  dis- 
cussed in  the  reorganization  of  history  courses  in  the 
grades.  With  this  thought  in  mind,  a  class  of  advanced 
and  graduate  students  taking  a  course  in  the  elementary 
curriculum  with  the  author  at  the  University  of  Oregon 
undertook  as  a  special  problem  to  score  the  book  in  ref- 
erence to  four  salient  features.  In  order  to  provide  some 
standards  for  evaluation  and  comparison,  two  of  its  com- 
petitors were  scored  with  it.  These  books  were  History 
of  the  American  People  by  Beard  and  Bagley,  and  History 
of  the  United  States  by  Gordy,  The  scoring  was  done  in 
reference  to  the  following  points : 

1.  Names  of  places — scored  under  the  follo\tTing  headings: 
(a)  Names  of  continents  and  countries 
(6)  Names  of  states  and  territories 

(c)  Names  of  rivers 

(d)  Names  of  cities  and  towns 

(e)  Names  used  in  connection  with  military  events 
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(/)  A  miscellaneous  list  which  did  not  fit  under  any  of  the 
above  headings 

2.  Names  of  men — scored  under  the  following  headings : 

{a)  Explorers  and  discoverers 
(6)  Rulers,  presidents,  and  governors 

(c)  Names  mentioned  in  connection  with  industry  and  in- 
vention 
{d)  Names  of  statesmen 
(e)  Names  mentioned  in  reference  to  military  matters 

3.  Dates 

4.  Amount  of  space  devoted  to  political,  social  and  economic, 

and  military  matters;  also  the  number  of  pages  devoted  to 
pictures,  maps,  and  illustrations. 

Table  V. — Names  of  the  Ten  Countries  of  Greatest  Frequency 


Mace 

Beard  and  Bagley 

Gordy 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

England 

United  States . 

France 

Spain 

Canada 

Mexico 

India 

Great  Britain . 

HoUand 

Alaska 

250 
129 
91 
77 
44 
42 
16 
16 
14 
12 

United  States. 

England 

France 

Great  Britain. 

Spain 

Germany .... 

Mexico 

Russia 

China 

Cuba 

333 
122 

93 
82 
68 
60 
35 
24 
21 
19 

England 

United  States. 

France 

Spain 

Me:dco 

Canada 

Cuba 

China 

Japan 

East  India .  .  . 

220 
158 
62 
58 
27 
18 
15 
11 
11 
10 

Table  VI. — Summary  of  Countries  and   Continents  Mentioned 


Mace 

Beard  and 
Bagley 

Gordy 

Total  number  of  coimtries  mentioned 

Number  mentioned  but  once 

42 
14 

785 

6 

286 

55 

15 

1,069 

6 

224 

48 
16 

Total  number  of  mentions  in  each  text 

Number  of  continents  mentioned       

687 
5 

Frequency  of  mention  of  continents 

161 
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Table  VII. — Names  of  the  Ten  States  and  Territories  of 
Greatest  Frequency 


Mace 

Beard  and  Bagley 

Gordy 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

Virginia 

New  York .... 
Massachusetts 
Pennsylvania  . 
South  Carolina 
Maryland.  .  .  . 

138 
75 
72 
68 
44 
42 
36 
36 
36 
33 

Virginia 

Massachusetts 
Pennsylvania  . 
New  York .... 
South  Carolina 
Ohio 

86 
62 
58 
54 
43 
39 
38 
38 
34 
33 

Virginia 

Massachusetts 
New  York. .  .  . 
South  Carolina 

Georgia 

Louisiana 

Connecticut. . . 
North  Carolina 

Florida 

Pennsylvania  . 

76 
57 
51 
42 
25 
24 

Texas 

Texas 

California .... 
Kentucky .... 
Illinois 

24 

Tennessee .  . 
Kentucky .  . 
New  Jersey. 

21 
19 
19 

Mace  mentions  51  states  and  territories  with  a  total 
frequency  of  1,066;  49  of  them  he  mentions  two  or  more 
times.  Beard  and  Bagley  mention  51  states  and  territories 
with  a  total  frequency  of  1,085,  all  of  which  are  mentioned 
two  or  more  times.  Gordy  mentions  48  states  and  terri- 
tories with  a  frequency  of  572,  eight  of  which  are  men- 
tioned but  once. 

Table  VIII  records  the  number  of  rivers  mentioned  in 
the  three  texts.  Mace  mentions  56  rivers  with  a  total  fre- 
quency of  256,  of  which  32  appear  but  once.  Beard  and 
Bagley  mention  35  with  a  total  frequency  of  142,  of  which 
19  appear  but  once.  Gordy  mentions  43  with  a  total  fre- 
quency of  219,  of  which  22  appear  but  once. 

Table  IX  records  the  number  of  cities  appearing  in  the 
three  texts  with  their  frequencies.  Mace  mentions  153 
cities  with  a  frequency  of  720,  77  of  which  are  mentioned 
but  once,  and  26  are  mentioned  twice.  Beard  and  Bagley 
mention  186  cities  with  a  total  frequency  of  682,  of  which 
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Table  VIII. — Names  of  the  Ten  Rivers  of  Greatest  Frequency 


Mace 

Beard  and  Bagley 

Gordij 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

Mississippi.. .  . 

Hudson 

Ohio 

Potomac 

Delaware 

St.  Lawrence.. 
Connecticut.. . 
Rio  Grande . . . 

James 

Niagara 

60 

25 

24 

24 

13 

11 

8 

8 

6 

5 

Mississippi. . 

Ohio 

Hudson 

Delaware .  .  . 
Potomac .... 
Columbia . . . 
Missouri. .  .  . 
Rio  Grande . 
Arkansas.  . . 
St.  Lawrence 

47 

21 

9 

7 

I 

4 
4 
3 
3 

Mississippi. .  . 

Hudson 

Ohio 

Shenandoah . . 
St.  Lawrence. 

Mohawk 

Potomac 

Connecticut . . 
Tennessee .... 
Delaware .... 

60 
34 

28 
10 
8 
8 
5 
5 
4 
4 

105  appear  but  once,  and  30  are  mentioned  twice.  Gordy 
mentions  140  with  a  total  frequency  of  529,  of  which  72 
appear  but  once,  and  16  appear  twice. 


Table  IX. — Names  of  the  Ten  Cities  of  Greatest  Freqxjbncy 


Mace 

Beard  and  Bagley 

Gordy 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

New  York .... 
Philadelphia .  . 

Boston 

Washington..  . 
Charleston.. .  . 

Chicago 

London 

New  Orleans. . 

Albany 

St.  Louis 

65 
60 
59 
45 
31 
25 
22 
21 
17 
17 

New  York .  .  . 
Philadelphia. . 

Boston 

New  Orleans . 

Chicago 

Washington .  . 
Charleston .  .  . 

St.  Louis 

Pittsburgh .  .  . 
Buffalo 

68 
48 
42 
26 
25 
22 
18 
17 
16 
12 

New  York .  .  . 

Boston 

Philadelphia.. 
Washington.  . 
New  Orleans . 
Richmond.. .  . 
Charleston .  .  . 

Hartford 

Baltimore .... 
Albany 

41 
34 
32 
31 
19 
14 
14 
12 
10 
10 
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Table  X  records  the  places  mentioned  in  reference  to 
military  events.  Mace  mentions  172  places  with  a  fre- 
quency of  433,  81  of  which  occur  but  once,  and  34  are 
mentioned  twice.  Beard  and  Bagley  mention  127  places 
with  a  frequency  of  260,  of  which  70  occur  but  once,  and 
24  twice.  Gordy  mentions  184  places  with  a  frequency  of 
560,  of  which  85  occur  but  once,  and  31  are  mentioned 
twice. 


Table  X. — Names  of  the  Ten  Places  of  Greatest  Frequency 
Appearing  in  Connection  with  Military  Affairs 


Mace 

Beard  and  Bagley 

Gordy 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

Name              ^^^ 
^^^^          quency 

Richmond 

Vicksburg 

Chattanooga . . . 
Gettysburg .... 

Yorktown 

Concord 

Ft.  Sumter.  .  .  . 

Quebec 

Trenton 

Bunker  HUl.... 

18 
14 
9 
9 
9 
8 
8 
8 
8 
7 

Concord 

Richmond.  .  .  . 

Boston 

New  York .... 
Washington, 

D.C 

Virginia 

Lexington 

Gettysburg..  .  . 

Santiago 

Philadelphia.. . 

9 
9 

8 
8 

6 
6 
6 
5 
5 
5 

Richmond .... 
Mississippi 

River 

New  York .... 
Hudson  River . 

Virginia 

Washington, 

D.C....... 

Philadelphia..  . 

Boston 

South  Carolina 

17 

16 
16 
16 
16 

15 
14 
12 
11 

Table  XI  gives  a  miscellaneous  list  of  places  which  did 
not  fit  into  any  of  the  above  classifications.  Mace  men- 
tions 140  places  with  a  total  frequency  of  484,  of  which 
88  places  are  mentioned  but  once,  17  twice,  and  5  three 
times.  Beard  and  Bagley  mention  184  places  with  a  total 
frequency  of  696,  112  of  which  are  mentioned  once,  25 
twice,  and  18  three  times.  Gordy  mentions  136  places  with 
a  total  frequency  of  569,  72  of  which  are  mentioned  once, 
23  twice,  and  18  three  times. 
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Table  XI. — Miscellaneous  List  of  Names  of  Places  Mentioned 


Mace 

Beard  and  Bagley 

Gordy 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

Name 

Fre- 
quency 

New  England .  . 
Cuba 

85 
33 
20 
19 
16 
15 
14 

13 
9 

8 

South 

North 

New  England. . 
West   

94 
71 
49 
41 
27 
24 
16 
12 

10 
9 

South 

North 

New  England. . 
Allegheny  Mts. 
West 

114 

85 

Atlantic  Ocean . 
West  Indies 

57 
18 

Confederacy .  . . 
N  Nether  lands. 

Pacific  Ocean . . 
East   

16 

Atlantic  Ocean. 
LakeChamplain 
Pacific  Ocean. . 

Lake  Erie 

New  World .  .  . 

11 

PhiUppines .... 
District  of  Co- 
lumbia   

New  Amsterdam 
East  Indies.  .  .  . 

Confederacy..  . 
Atlantic  Ocean. 
Mississippi 

Valley 

Great  Lakes. . . 

9 

8 
8 

7 

Grand  Total  of  Places  Mentioned 

Mace  480 

Beard  and  Bagley 507 

Gordy .284 

To  this  total  should  be  added  a  number  of  places  con- 
nected with  military  events,  which  would  bring  the  grand 
total  up  somewhat  higher. 

Prom  the  foregoing  tables  one  is  impressed  with  the 
vast  number  of  geographical  facts  a  child  in  the  seventh 
and  eighth  grades  must  know  to  read  these  histories  in- 
telligently. 

If  mere  frequency  of  mention  is  in  any  way  indicative 
of  the  importance  of  a  place,  the  rivers,  cities,  places  con- 
nected with  military  events,  countries,  etc.,  may  thus  be 
evaluated. 

In  comparing  the  frequencies  of  mentions  of  the  cities, 
rivers,  countries,  states,  places  mentioned  in  reference  to 
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military  events,  and  even  those  in  the  miscellaneous  lists, 
one  is  struck  by  the  great  similarity  throughout  the  three 
texts.  The  first  ten  places  receiving  the  greatest  number 
of  mentions  in  one  text  are  practically  the  same  as  those 
in  the  others. 

If  this  is  a  fair  evaluation,  the  teacher  may  be  justified 
in  taking,  say,  the  ten  places  of  highest  frequency  from 
each  of  the  six  tables,  making  sixty  places  in  all,  for  more 
or  less  intensive  geographical  study. 

Table  XII  is  a  summary  of  five  tables  of  men  mentioned 
in  the  three  textbooks. 

Space  would  not  permit  more  than  a  summary  of  each 
of  these  tables. 

Table  XII. — Names  of  Men  Mentioned 


Explorers  and  discoverers 

Rulers,  presidents  and  governors 

Names  mentioned  in  connection  with  indus- 
try and  invention 

Names  of  statesmen 

Names  mentioned  in  reference  to  military 
matters 


Mace 


47 
87 

11 

29 

167 


Beard  and 
Bagley 


34 
69 

40 


76 


Gordy 


30 
66 

8 
44 

104 


*Data  not  extant. 


Table  XIII  gives  the  dates  that  had  a  frequency  of  five 
or  more  in  each  of  the  three  texts,  the  distribution  of 
dates  in  the  study,  "Possible  Defects  in  the  Present  Con- 
tent of  American  History  as  Taught  in  the  Schools,"  re- 
ported by  Horn  in  the  Sixteenth  Yearbook  of  the  National 
Society  for  the  Study  of  Education,  Part  I,  and  also  the 
study  made  by  Bagley  in  the  Fourteenth  Yearbook  in  the 
same  series. 
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Table  XIII.— Dates  Appearing  Five  or  More  Times 


Mace 

Beard  and  Bagley 

1 

Gordy 

Frequency  of 

Dates  Found 

by  Horn 

Dates  Ranked 
According  to 
Importance  in 
Bagley's  Study 

Date 

Fre- 
quency 

Date 

Fre- 
quency 

Date 

Fre- 
quency 

Date 

Fre- 
quency 

Date 

Rank 

1862 

16 

1860 

39 

1862 

17 

1900 

114 

1776 

1 

1812 

13 

1850 

25 

1812 
1900 

14 
14 

1890 

84 

1492 

2 

18&1 

12 

1017 

20 

1837 

13 

1850 

56 

1607 

3 

1860-63 

11 

1013 

19 

1861 

12 

1870 

53 

1789 

4 

1776 

9 

1861 

17 

1850 

11 

1825 

52 

1620 

5 

1865 

9 

1893 

52 

1837-50-87 

8 

1862 

16 

1763 
1820 
1807-60-64 

10 
10 
9 

1902 

43 

1803 

6 

1814 

7 

1865 

15 

1880 

40 

1861 

7 

1846 

7 

1750-63-77 

6 

1916 
1812 

14 
14 

1863-65-93 

8 

1910 

38 

1787 

8 

1844-73 

6 

1636-39-64-82 

5 

1816 

13 

1781 
182a-35-90 

7 
7 

1869 

33 

1863 

9 

1754-7*-94 

5 

1811-16-21-20 

5 

1825-28-47-72 

5 

187&-97 

5 

1776 

12 

1660 

6 

1860 

28 

1820 

10 

1864-70 

12 

1754-79-87-89 

6 

1910 

12 

1832 

6 

1900 

11 

1609-89 

1814-15-30 

1836-46-54 

1857-73-77 

1895-99 

1801 

5 
5 
5 
5 
5 
5 

1913 

27 

1812 

11 

1848-63 

10 

1848-82 

26 

1765 

12 

1763 

9 

1871 

22 

1783 

13 

1828-25-67 

9 

1880-96 

9 

1912 

9 

1836-40-95 

8 

1840 

20 

1865 

14 

1908 

8 

1787 

7 

1857 

14 

1857 

15 

1800-10-11 

7 

1844-72-90 

7 

1815-20-33 

6 

1825 

12 

1854 

16 

1837-46-47 

6 

1858-69-79 

6 

1887-88-92 

6 

1893 

6 

1909-11-14 

6 

1769-92 

5 

1830 

11 

1775 

17 

1806-01-14 

5 

1819-22-41 

5 

1854-66-68 

5 

1871-77-78 

5 

1897-98 

5 

1901-04-05-07 

5 

1790 
1800-03 
1819-20 
1794 

8 
8 
7 
4 

1781 

1823 
1846 

18 

19 
20 
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Table  XIII  is  read  as  follows:  The  date  1862  occurred 
16  times  in  Mace's  History;  the  date  1860  occurred  39 
times  in  Beard  and  Bagley;  the  date  1862  occurred  17 
times  in  Gordy;  the  dates  1844  and  1873  occurred  six 
times  in  Mace ;  the  dates  1848  and  1863  each  occurred  five 
times  in  Beard  and  Bagley,  and  so  on. 

The  following  facts  in  reference  to  the  dates  in  the  three 
history  texts  are  significant.  Mace's  History  contains  326 
different  dates;  Beard  and  Bagley,  235;  Gordy,  206.  It 
is  interesting  to  note  that  the  ranks  of  the  history  dates 
found  in  these  three  history  texts  do  not  agree  with  those 
found  by  Dr.  Horn  in  the  study  of  '  *  Thirty-Eight  Modem 
Crucial  Problems"  in  the  above-mentioned  study,  or  with 
the  dates  selected  by  specialists  in  American  history  and 
reported  bj''  Bagley  in  the  Fourteenth  Yearbook.  The  most 
important  date  found  in  Mace's  History,  as  far  as  fre- 
quency is  concerned,  is  1862;  in  Beard  and  Bagley,  1860; 
in  Gordy,  1862;  in  Horn's  study,  it  is  1900;  and  in 
Bagley 's  study,  1776. 


Table  XIV. — Amount  of  Space  Devoted  to  Political,  Social  and 
Economic,  and  Military  Phases  of  American  History 


Mace 

Beard  and 
Bagley 

Gordy 

Type  of  Material 

Pages 

Per 
Cent 

of 
Book 

Pages 

Per 

Cent 

of 
Book 

Pages 

Per 

Cent 

of 
Book 

Political  movements 

Social  and  economic  move- 
ments   

Military  movements 

Maps  and  illustrations. . .  . 

144.65 

150.00 
98.40 

83.75 

30.13 

31.28 
20.50 
17.40 

240.50 

155.37 
39.50 
97.95 

37.81 

24.43 

6.21 

15.40 

148.44 

63.22 
66.22 
80.33 

33.96 

14.46 
15.50 
18.38 

The  above  percentages  do  not  include  the  bibliographies, 
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review  questions,  and  other  repeated  materials  in  the  texts. 
One  significant  thing  about  these  texts  is  the  amount  of 
space  devoted  to  pictures  and  maps.  Some  of  the  books 
have  approximately  15  per  cent  of  their  space  devoted  to 
pictures.  Since  the  picture  cost  is  more  than  the  cost  of 
a  regular  printed  page  it  brings  the  approximate  cost  of 
tlie  pictures  up  to  20  per  cent  of  the  entire  cost  of  the 
book.  When  one-fifth  of  the  manufacturing  cost  of  a  book 
is  the  cost  of  reproducing  pictures,  one  should  be  pretty 
sure  that  the  pictures  are  valuable  and  will  be  generally 
used. 

If  the  contents  of  these  books  are  compared  with  the 
type  of  material  discussed  by  Horn,  cited  above,  it  will  be 
seen  that  in  the  former  relatively  much  less  space  is  de- 
voted to  social  and  economic  movements.  However,  it  was 
not  the  purpose  of  the  studies  to  determine  the  relative 
amount  of  space  that  ought  to  be  given  to  these  various 
phases  of  American  history  but  rather  to  determine  the 
amount  of  space  that  was  actually  being  given  in  these 
three  texts.  The  same  thing  may  be  said  in  reference  to 
the  other  phases  scored. 

II.    The  Measurements   of   the   Physical   Growth   of 
School  Children 

In  the  last  analysis,  education  may  be  reduced  to  the 
process  of  producing,  directing,  and  preventing  changes 
in  human  beings.  This  process  has  to  do  with  the  physical 
changes  as  well  as  the  mental.  It  is  a  platitude  to  say  that 
the  physical  and  physiological  traits  of  the  individual 
primarily  condition  his  total  nature ;  yet  the  exact  relation 
that  exists  between  the  physical  and  mental  development 
of  an  individual  is  not  known. 

Much  work  has  been  done  on  the  measurement  of  the 
physical  growth  of  children,  largely  because  some  scientists 
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have  believed  that  measurements  of  this  kind  would  in 
some  way  furnish  a  key  to  mental  development.  Hun- 
dreds of  thousands  of  school  children  have  been  measured 
to  determine  their  height,  weight,  vital  capacity,  reaction 
time,  and  other  physical  traits.  Norms  for  these  traits 
for  each  year  of  a  child's  life,  until  he  gets  far  into  the 
'teens,  have  been  established.  It  would  seem  that  there 
should  be  a  somewhat  definite  relation  existing  between 
the  development  of  the  nervous  system  and  the  mental 
maturity  of  an  individual;  that  frail  undersized  children 
should  be  taught  somewhat  differently  from  the  strong  and 
robust;  and  that  courses  of  study  should  be  reorganized 
to  take  cognizance  of  the  development  of  physical  traits 
of  scliool  children.  Yet  in  spite  of  the  hundreds  of 
thousands  of  measurements  that  have  been  made  on  these 
physical  traits,  little  change  that  takes  cognizance  of  the 
growth  and  degree  of  maturity  of  these  physical  traits 
has  been  made  in  the  organization  of  school  programs. 
Some  time  in  the  near  future  the  facts  that  have  been 
obtained  by  the  physical  measurements  of  school  children 
may  be  extremely  valuable  in  the  teaching  process.  To 
date,  however,  their  pedagogical  value  has  been  small.  To 
state  the  problem  concretely,  how  should  the  method  of 
the  recitation  or  the  tj'pe  of  subject-matter  presented  to 
children  who  are  exceptionally  well  developed  physically 
differ  from  that  presented  to  those  children  who  are 
scarcely  normal  as  far  as  physical  development  is  con- 
cerned? It  very  frequently  happens  that  those  who  are 
physically  inferior  exceed  those  whose  physical  develop- 
ment is  far  above  the  normal,  and  the  work  done  by  those 
physically  inferior  apparently  docs  not  injure  them.  Of 
course,  many  valuable  facts  relative  to  growth  have  been 
discovered  that  are  beginning  to  be  utilized  in  health  work 
among  children. 
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III.  The  Measurements  of  the  Money  Cost  of  Education 

So  many  measurements  have  been  made  in  this  field, 
and  the  literature  is  so  familiar,  that  a  bare  mention  of 
it  is  sufficient  here.  We  have  made  careful  records  of  the 
cost  of  school  buildings,  school  apparatus,  teachers' 
salaries,  and  other  expenses  incident  to  teaching. 
Recently  we  have  worked  out  norms  for  supervision,  teach- 
ing, janitor  service,  and  the  cost  per  clock-hour  to  teach 
the  various  subjects.  A  school  official  may  now  compare 
the  distribution  of  school  funds  in  his  city  with  that  in 
other  cities  and  determine  whether  he  is  paying  too  much 
or  too  little  for  supervision,  janitor  service,  or  any  other 
phase  of  the  school  work. 

Intensive  studies  have  been  made  in  school  costs  and 
accounting  of  school  work.  This  phase  of  measurements 
is  of  interest  primarily  to  the  department  of  school  super- 
vision rather  than  to  the  average  room  teacher. 

IV.  The  Measurements  of  School  Buildings 

The  score  card  for  the  measurement  and  standardization 
of  school  buildings  came  as  a  result  of  the  movement  for 
greater  efficiency  in  education.  The  first  one  made  its 
appearance  in  1916  and  was  made  under  the  direction  of 
Dr.  George  D,  Strayer  of  Columbia  University.  A  new 
Score  Card  for  City  School  Buildings  was  made  by  Strayer 
and  Engelhardt  in  1920.  This  card  has  two  special  pur- 
poses: (1)  The  scoring  of  school  buildings  in  the  light 
»/  a  school  building  program  to  he  developed  by  a  city. 
(2)  The  checking  of  plans  for  new  school  buildings.^  This 
score  card  has  grown  out  of  the  experience  in  evaluating 
more  than  1,000  school  buildings  and  the  study  of  school 

'  Score  Card  for  City  School  Buildings,  Teachers  College  Bulletin 
No.  10,  Jan.,  1920. 
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building  standards.  Like  many  of  the  scales  used  in  the 
measurements  of  school  achievements,  it  is  a  score  card 
based  on  the  combined  or  median  judgments  of  experts. 
A  hall  one  foot  too  narrow  is  not  wrong  in  the  same  sense 
that  an  addition  problem  is  wrong;  but  it  is  not  the  most 
efficient  precisely  because  it  is  the  combined  opinion  of 
competent  judges  that  it  should  be  one  foot  wider. 

When  one  considers  the  amount  of  money  that  is  spent 
in  the  erection  of  school  buildings,  he  wonders  why  an 
attempt  to  standardize  school  buildings  was  not  made 
sooner.  Individuals  judging  buildings  not  infrequently 
think  mainly  in  terms  of  two  or  three,  or  possibly  a  half 
dozen,  elements  that  seem  to  them  of  primary  importance 
and  often  neglect  other  parts  of  the  building  that  are  of 
equal  importance.  The  score  card  is  designed  to  include 
as  nearly  as  possible  all  these  details  that  go  to  make  up  a 
perfect  school  building.  In  assigning  weights  to  the 
various  elements  the  judgment  method  is  resorted  to.  The 
scoring  is  done  on  a  basis  of  1,000  points ;  that  is,  a  perfect 
building  scores  1,000.  The  principal  divisions,  or  headings, 
for  scoring  city  school  buildings  with  the  weights  assigned 
to  each  heading  are  given  below: 

1.  Site   125 

(a)  Location  55 

( b )  Drainage    30 

(c)  Size  and  form 40 

2.  Building   165 

(a)  Placement 25 

(b)  Gross  structure   60 

(e)  Internal  structure   80 

3.  Service  systems 280 

(a)  Heating  and  ventilating    70 

(b)  Fire  protection  system  65 

(c)  Cleaning  system 20 

(d)  Artificial  lighting  system   20 

(e)  Electric  service  system   15 

(/)  Water  supply  system   30 
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(^r)  Toilet  system 50 

(h)  Mechanical  service  system 10 

4.  Classrooms    290 

(a)   Location  and  connection    35 

(&)   Construction  and  finish   95 

(c)  Illumination 85 

(d)  Cloakrooms  and  wardrobes  25 

(e)  Equipment  50 

5.  Special  rooms    140 

(a)  Large  rooms  for  general  use 65 

(&)  Rooms  for  school  officials 35 

(c)   Other  special  service  rooms 40 

Each  of  the  above  headings  have  a  number  of  subhead- 
ings which  go  into  detail  as  to  the  various  parts  of  the 
buildings. 

The  score  card  for  the  measurement  of  school  buildings 
is  meeting  with  approval  everywhere.  Large  cities  that 
have  voted  down  school  bonds  for  improvement  and  repairs 
have  voted  twice  the  sum  called  for  when  the  buildings 
have  been  scored  and  the  exact  defects  made  known  to  the 
public. 

It  seems  evident  that  a  score  card  of  this  nature  will 
eventually  become  a  standard  for  the  erection  of  most 
school  buildings. 

V.  The  Measurements  op  Retardation,  Acceleration, 
AND  Elimination 

Measurements  of  retardation,  acceleration,  and  elimina- 
tion were  among  the  first  to  be  made  in  the  recent  general 
movement  in  educational  measurements  for  greater  school 
efficiency.  The  movement  may  be  said  to  have  started 
in  earnest  in  1904  when  Dr.  William  H.  Maxwell,  city 
superintendent  of  the  schools  of  New  York,  showed  in  his 
annual  report  that  39  per  cent  of  the  pupils  in  the  ele- 
mentary grades  were  above  the  normal  age  for  the  grades 
they  were  in. 
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This  startling  revelation  caused  school  officials  to  check 
up  their  school  systems  to  see  if  similar  conditions  pre- 
vailed elsewhere.  From  that  time  to  the  present  those 
in  charge  of  the  schools  in  every  state  in  the  Union  have 
been  measuring  the  amount  of  retardation  in  their  schools 
and  attempting  to  assign  causes  for  the  retardation  found. 

It  was  the  importance  of  this  problem  with  its  bearing 
on  the  question  of  the  adaptation  of  the  school  to  the  needs 
of  the  child,  and  the  almost  complete  lack  of  definite  in- 
formation bearing  on  the  question,  that  impelled  the 
Russell  Sage  Foundation  to  undertake  in  1907  an  investi- 
gation of  **some  phases  of  the  adaptability  of  the  school 
and  its  grades  to  children. ' '  ^  The  investigators  were  in- 
terested, not  in  the  individual  subnormal,  or  in  a  typical 
child,  but  rather  in  that  large  class,  varying  with  local 
conditions  from  5  to  75  per  cent  of  all  the  children  in 
our  schools,  who  were  older  than  they  should  be  for  the 
grades  they  were  in.  Data  gathered  for  this  study  seemed 
to  warrant  the  statement  that  at  least  6,000,000,  or  33  per 
cent,  of  the  pupils  in  the  public  schools  were  retarded. 

Thirteen  per  cent  more  retardation  was  found  among 
boys  than  among  girls.  The  percentage  of  girls  who  com- 
pleted the  common  school  course  was  17  per  cent  greater 
than  the  percentage  of  boys.  Studies  of  this  kind  brought 
out  the  fact  that  the  schools  as  then  organized  were  better 
fitted  for  the  needs  of  girls  than  they  were  for  the  needs 
of  boys. 

It  was  also  found  that  there  was  a  high  correlation  be- 
tween retardation  and  elimination.  In  those  schools  where 
the  pupils  were  greatly  retarded,  a  large  majority  did  not 
remain  to  finish  the  course. 

The  report  of  the  Commissioner  of  Education  in  1907 
shows  the  distribution  of  children  through  the  grades  in 

'  Leonard  P.  Ayres,  Laggards  in  Our  Schools  (Russell  Sage  Founda- 
tion, 1907),  p.  2. 
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886  cities  of  8,000  population  and  above.  From  this  report 
it  was  shown  that  for  every  1,000  pupils  entering  the  first 
grade,  the  second  grade  would  have  723,  the  third  grade, 
692,  the  fourth  grade,  640,  the  fifth  grade,  552,  the  sixth 
grade,  462,  the  seventh  grade,  368,  the  eighth  grade,  263, 
the  first  year  of  high  school,  189,  the  second  year  of  high 
school,  123,  the  third  of  year  high  school,  81,  and  the  fourth 
year  of  high  school,  56. 

Of  course  conditions  have  improved  since  1907  as  a  result 
of  the  studies  that  have  been  made  on  retardation  and 
elimination.  The  following  study  made  by  Ayres  is 
probably  typical  of  the  progress  that  is  being  made. 

A  little  more  than  ten  years  ago  the  Department  of 
Education  of  the  Russell  Sage  Foundation  made  a  coopera- 
tive study  of  200,000  school  children  in  29  city  school 
systems  to  determine  their  progress  through  the  grades.* 
In  the  spring  of  1920  the  same  schools  were  asked  to  re- 
})eat  their  earlier  study  in  order  that  some  estimate  could 
be  made  of  the  progress,  if  any,  attained  during  the  in- 
terim. Fifteen  cities  repeated  the  tests  using  the  same 
procedure  and  the  same  record  blanks  that  had  been  used 
in  the  first  test.  The  15  cities  repeating  the  test  had  83,283 
children  in  1911  and  111,680  in  1920. 

In  working  out  the  age-grade  tables  a  pupil  who  was 
seven  years  old  and  in  the  first  grade  was  considered 
of  normal  age  and  one  year  was  added  for  each  advanc- 
ing grade.  A  pupil  could  thus  be  classified  both  according 
to  his  age  and  his  progress.  In  regard  to  age,  he  was  either 
younger  than  normal,  normal,  or  older  than  normal.  With 
respect  to  his  progress  he  was  either  slow,  normal,  or  rapid. 
Since  both  age  and  progress  were  recorded  and  there  were 
three  groups  in  each  classification,   each   child  could  be 


« Leonard  P.  Ayres,  "The  Increasing  Efficiency  of  Our  City  School 
Systems,"  Elementary  School  Journal,  Vol.  21,  Feb.,  1921,  pp.  416-423. 
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assigned  to  any  one  of  nine  different  classes.     Table  XV 
shows  the  classification  for  each  100  pupils  in  1911. 


Table  XV. — School  Children  by  Young,  Normal,  and  Old,  and 

BY  Rapid,  Normal,  and  Slow  Groups,  Fifteen  Cities,   1911 

{After  Ayres) 


Young 

Normal 

Old 

Total 

Rapid 

Normal 

Slow 

6 
20 

2 

3 

21 

9 

2 
11 
26 

11 

52 
37 

Total 

28 

33 

39 

100 

Table  XV  is  read  as  follows :  Six  children  were  younger 
than  normal  for  their  grades  and  had  progressed  faster 
than  normal.  The  21  who  appear  in  the  second  column 
were  of  normal  age  and  made  normal  progress,  and  so  on. 

Table  XVI  shows  the  conditions  in  1920,  computed  on 
a  basis  of  100  pupils. 

Table  XVI. — School  Children  by  Young,  Normal,  and  Old,  and 

BY  Rapid,  Normal,  and  Slow  Groups,  Fifteen  Cities,  1920 

(After  Ayres) 


Young 

Normal 

Old 

Total 

Rapid 

Normal 

Slow 

10 

28 
2 

2 

23 

9 

1 

7 

18 

13 

58 
29 

Total 

40 

34 

26 

100 

The  data  of  Table  XVI  show  that  conditions  in  1920 
were  better  than  in  1911.  The  children  who  were  both 
young  and  making  more  than  normally  rapid  progress 
had  increased  from  six  in  each  100  to  ten.  Those  in  the 
center  of  the  table,   of  normal  age  and  making  normal 
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;  progress,  had  increased  from  21  per  cent  to  23  per  cent. 
The  most  important  change  is  that  in  the  figures  in  the 
k)wer  right-hand  corner  which  shows  that  the  unfortunate 
misfits  who  were  over  age  and  making  slow  progress  had 
diminished  from  26  per  cent  to  18  per  cent. 

During  the  nine  years  the  percentage  of  over-age  chil- 
dren had  fallen  from  39  to  26  and  the  proportion  of  slow 
pupils  from  37  in  each  100  to  29  in  each  100.     These 

'  improvements  are  large  and  important.  They  represent 
educational  economy,  financial  saving,  and  human  conser- 

I  vation.^ 

When  the  survey  movement  in  the  schools  started  about 
1912,  the  questions  of  retardation  and  elimination  received 
a  great  deal  of  attention.  A  great  deal  of  work  has  been 
done  in  this  field  by  Ayres,  Strayer,  Thorndike,  and  others. 
The  field  has  been  rather  completely  covered  in  state  and 
city  surveys  and  other  official  reports. 

The  question  of  the  bright  child,  the  accelerated  child, 
is  just  now  receiving  much  attention.  Educators  are  be- 
ginning to  realize  that  perhaps  the  bright  child  has  suffered 
from  our  ill-adapted  school  system  rather  than  the  dull 
one.  Besides,  it  is  the  bright  child  who  will  eventually 
become  a  leader  in  society,  a  moulder  of  public  opinion, 
and  hence  the  child  who  will  yield  the  largest  income  on 
the  investment  made.  Schools  are  accordingly  being  re- 
organized to  take  special  cognizance  of  the  bright  child. 
Large  sums  of  money  are  being  donated  to  educators  for 
special  work  in  this  field.  Large  cities  are  holding  sum- 
mer sessions  to  make  it  possible  for  the  bright  child  to 
gain  a  half  grade  or  even  a  grade  during  six  or  eight  weeks 
in  the  summer.  Special  classes  are  being  provided  for  him 
in  most  of  the  larger  school  systems  and  in  many  of  the 
more  progressive  smaller  ones. 

9  Ibid.,  pp.  418-419. 
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All  of  this  means  measurements.  The  bright  children 
are  selected  by  determining  their  general  intelligence  and 
school  achievements.  It  is  not  too  much  to  prophesy,  per- 
haps, that  the  chief  reorganization  of  the  school  will  be 
along  lines  which  will  take  cognizance  of  the  individual 
differences  among  children. 
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CHAPTER  IX 

EDUCATIONAL   STATISTICS,    GENERAL    STATEMENT 

The  mastery  of  one  more  set  of  tools  is  necessary  before 
the  educator  may  consider  himself  fully  equipped  to  speak 
intelligently  about  his  educational  processes  and  products. 
Twenty-seven  million  school  children  now  sit  at  the  feet 
of  750,000  teachers  to  receive  instruction.  The  annual 
cost  of  operating  this  great  institution  is  well  beyond  the 
billion  dollar  mark.  In  a  great  many  school  systems  more 
than  100,000  school  children  are  under  the  direction  of  a 
single  educator.  The  school  systems  have  grown  so  large 
and  there  are  so  many  individual  characteristics  and 
variations  that  affect  the  various  groups  that  it  is  impos- 
sible, without  the  aid  of  additional  tools  by  which  we  may 
compare,  contrast,  and  weigh  one  tendency  with  another, 
to  get  an  idea  of  the  general  movement  of  the  group  as  a 
whole. 

The  human  mind  is  so  constituted  that  it  cannot  image 
and  comprehend  a  large  number  of  distinct  impressions 
at  any  one  time.  For  example,  he  would  be  a  man  with 
exceptional  mnemonic  power  who  could  listen  to  the  read- 
ing of  two  lists  of  grades  of  100  each  that  were  made  by 
students  taught  by  two  different  methods  in  the  subject 
of  addition  and  tell  which  method  Avas  the  better  as  shown 
by  the  scores  made  by  the  pupils  in  each  group. 

The  power  needed  to  detect  the  movements  of  groups 
as  a  whole,  when  the  movements  of  the  individuals  within 
the  group  are  many  and  varied,  is  to  be  found  in  the 
elements  of  statistical  methods. 
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Use  of  Statistics  in  Other  Fields. — The  educator  is  not 
a  pioneer  in  this  field,  but  is  simply  following  the  lead 
of  the  biologist,  the  economist,  the  sociologist,  the  natural 
scientist,  and  others.  The  importance  of  statistical 
methods  applied  to  these  sciences  is  rarely  recognized. 
The  whole  doctrine  of  evolution  and  heredity  rests  in 
reality  on  a  statistical  basis.  It  is  in  this  direction  that 
the  most  important  new  work  of  a  statistical  nature  is 
being  done.  Out  of  the  great  number  of  observations, 
such  as  the  measurements  of  the  height  of  a  group  of 
men,  the  type  is  found,  that  is,  the  average  about  which 
all  the  measurements  are  grouped  according  to  some 
definite  law.  The  problem  is  then  to  determine  whether 
this  type,  or  the  groupings  about  it,  change,  and  in  what 
way.  The  differences  found  in  successive  generations  form 
the  data  on  which  arguments  as  to  evolution  and  develop- 
ment are  founded.  The  same  methods  apply  equally  to 
fossil  remains,  to  zoological  species,  and  other  organic 
forms.  If  this  method  were  neglected  many  valuable  argu- 
ments would  lose  their  force  and  theories  would  be  based 
on  personal  impressions  of  phenomena  instead  of  on  scien- 
tific measurements. 

In  certain  sciences  a  higher  degree  of  accuracy  is  re- 
quired than  is  possible  with  a  single  measurement.  For 
every  measurement,  however  apparently  absolute  it  may 
be,  is  a  relative  thing,  made  in  terms  of  something  else 
that  is  also  fallible ;  that  is  to  say,  that  is  subject  to  varia- 
tion. All  scientific  measurements,  in  other  words,  are  made 
in  terms  of  units  theoretically  invariable,  but  which  are 
always  practically  applied  by  means  of  a  mass  of  matter 
used  to  measure  with,  which,  itself,  is  of  necessity  a  more 
or  less  variable  quantity;  how  variable,  being  again,  in 
turn,  a  matter  of  determination.  Therefore,  many  measure- 
ments are  made  of  the  same  magnitude,  and  the  average 
taken  as  the  true  amount.     The  astronomer,  for  instance, 
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uses  statistical  methods  in  getting  the  position  of  the 
heavenly  bodies.  The  method  of  least  squares  ^  was  in- 
troduced by  him  in  locating  the  position  of  a  star  because 
he  was  anxious  to  choose  the  best  of  several  slightly  dis- 
crepant observations  of  the  position  of  the  star. 

The  Question  of  Error. — In  all  physical  and  biological 
observations  the  usual  method  is  to  take  several  measure- 
ments of  the  same  quantity  in  order  to  get  the  most  nearly 
accurate  result  obtainable  as  a  measure  of  the  thing  in 
question.  To  all  such  measurements  there  enters  the  factor 
of  experimental  error  due  to  a  number  of  causes  such  as 
environmental  conditions,  the  apparatus,  or  the  observer 
himself. 

The  errors  fall  in  two  general  classes:  They  are  (1)  the 
constant  errors,  which,  in  all  measures  of  the  same  quan- 
tity, made  with  the  same  care,  and  under  the  same  condi- 
tions, have  the  same  magnitude,  or,  whose  presence  and 
magnitude  are  due  to  some  fixed  cause;  and  (2)  the  so- 
called  accidental  errors  such  as  those  due  to  fatigue,  cold, 
nervousness,  poor  eyesight,  or  other  temporary  disability 
of  the  experimenter,  or  to  the  constitutional  bias  known 
as  the  personal  equation.  Some  errors,  of  course,  are  sim- 
ple mistakes,  as  in  reading  off  the  wrong  figure,  mistaking 
a  3  for  a  5,  for  instance. 

After  a  full  investigation  of  the  constant  errors  in  all 
physical  and  biological  measurements,  the  problem  then 
remains  of  combining  the  observations  so  that  the  remain- 
ing accidental  errors  shall  have  the  least  probable  effect 
upon  the  results,  and  it  is  to  bring  about  this  combination 
of  observations  that  we  employ  the  method  of  least  squares. 

Distribution  of  Measures  about  a  Point  of  Central  Tend- 
ency.— The    averages    obtained    by    different    scientists 


*  Used  to  locate  a  position  such  that  the  sum  of  the  squares  of  the 
distance  from  that  point  is  the  minimum. 
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from  the  same  series  of  biological,  sociological,  and  other 
observations  are  rarely  identical.  From  such  a  group  of 
measurements  it  is  necessary  to  deduce  the  most  probable 
estimate.  It  was  early  discovered  that  the  measurements 
obtained  by  different  scientists  measuring  the  same  thing 
showed  a  certain  definite  arrangement  in  accordance  with 
which  values  at  or  near  the  average  of  all  the  measures 
were  greatest  in  frequency ;  that  positive  errors  were  about 
as  frequent  as  negative  ones  of  the  same  magnitude;  and, 
that  large  errors  seldom  occur.  The  center  of  balance 
about  which  the  errors,  or  observations  fall,  is  known  as 
the  arithmetical  mean.  In  reference  to  the  arithmetical 
mean  as  a  measure  of  the  most  probable  value  of  a  series 
of  measurements  on  the  same  thing,  Merriman  says:^ 

The  most  probable  value  of  a  quantity  which  is  observed 
directly  several  times  with  equal  care,  is  the  arithmetical  mean 
of  the  measurements.  The  average,  or  arithmetical  mean,  has 
always  been  accepted  and  used  as  the  best  rule  for  combining 
direct  observations  of  equal  precision,  upon  one  and  the  same 
quantity  ...  if  the  measurements  be  but  two  in  number,  the 
arithmetical  mean  is  undoubtedly  the  most  probable  value;  and, 
for  a  greater  number,  mankind,  from  the  remotest  antiquity,  has 
been  accustomed  to  regard  it  as  such. 

For  example,  out  of  ten  discrepant  results,  it  is  impossible 
to  ascertain  the  true  value.  What  is  the  best  representative 
of  that  value?  Experience  has  shown  the  arithmetical 
mean  to  be  the  best,  that  is,  the  most  representative  value 
of  a  series  of  observations  made  under  the  same  conditions, 
all  being  equally  reliable.  The  arithmetical  mean  is  so 
regarded,  because  it  is  a  value,  the  deviations  from  which 
in  the  plus  and  minus  directions,  being  equally  probable, 
will  cancel  one  another.    Or  again,  if  one  hundred  judges 


*  A  Textbook  on  the  Method  of  Least  Sqimres  (New  York,  1913),  p.  22. 
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were  called  upon  to  judge  the  length  of  a  certain  room 
which  was  actually  80  feet  long,  it  would  be  found  upon 
tabulating  the  data  that  the  most  of  the  guesses  were  rela- 
tively close  to  80  feet,  and  as  the  guesses  deviated  farther 
and  farther  from  80  their  number  would  grow  consecu- 
tively less.  It  would  be  found  also  that  the  number  of 
guesses  that  exceeded  80  were  practically  equal  to  those 
that  were  less  than  80.  In  the  cases  just  mentioned,  and 
others  of  a  similar  nature,  the  mean  simply  represents  a 
center  of  equilibrium,  or  center  of  gravity,  as  it  were,  of 
the  variations  in  the  given  measurements.  Having  found 
that  center  of  equilibrium — the  arithmetical  mean — the 
next  problem  is  to  ascertain  what  amount  of  swing  or 
oscillation  there  may  be  on  either  side  of  this  center.  These 
are  the  variations,  which  correspond  to  the  errors  we  have 
referred  to,  in  the  making  of  physical  measurements.  The 
questions  arising  relative  to  variability  and  measures  of 
central  tendency  are  treated  more  at  length  in  subsequent 
chapters. 

The  graphic  representation  of  the  distribution  of  data 
in  reference  to  a  measure  of  central  tendency  as  discussed 
above  is  known  as  a  normal  or  prohahility  surface,  and  the 
curve  representing  it  is  known  as  a  normal  prohahility 
curve  discussed  in  a  previous  chapter. 

Measures  differing  from  the  average  as  discussed  above 
were  thought  of  as  being  in  error  and  this  gave  rise  to  the 
development  of  what  has  been  called  theory  of  error. 

Considerable  space  has  been  given  to  an  exposition  of 
the  significance  of  the  theory  of  error  in  this  general 
chapter  on  educational  statistics  because  a  correct  idea  of 
it  is  fundamental  to  the  discussion  that  follows. 

Educational  Measurements  Compared  with  Measure- 
ments in  Other  Fields. — We  have  thus  far  discussed  the 
question  of  error  where  a  large  number  of  measurements 
were  made  of  the  same  thing.    We  may  now  assume  that 
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there  is  an  analogy  between  a  series  of  measurements  of 
the  same  thing  and  a  series  of  single  measurements  of 
each  of  a  number  of  individuals  alike  in  some  important 
characteristics.  In  biology,  in  particular,  this  analogy  was 
found  to  work  very  well  but  was  less  applicable  to  economic 
data.  For  this  reason  there  have  grown  up  two  schools, 
the  one  adhering  to  the  doctrine  of  the  theory  of  error 
and  the  other  rejecting  it  in  the  main.  In  the  treatment 
of  educational  data  it  is  found  that  educational  measure- 
ments resemble  those  of  biology  in  their  structure,  that  is, 
that  the  theory  of  error  applies. 

As  investigations  in  new  fields  of  science  were  constantly 
being  made  and  as  the  data  became  more  complex,  the  need 
for  better  statistical  methods  became  more  imperative.  Re- 
fined statistical  inquiries  could  not  be  conducted  by  the 
crude  and  cumbersome  machinery  then  in  operation.  As 
a  result  of  this  need  the  development  of  the  pure  theory 
of  statistics  has  had  a  remarkable  growth  within  the  last 
two  or  three  decades.  Such  men  as  Francis  Edgeworth, 
August  Meitzen,  Francis  Galton,  Edward  L.  Thorndike, 
Karl  Pearson,  G.  Udny  Yule,  and  Charles  B.  Davenport 
have  contributed  in  the  field  of  biological  statistics;  and 
Arthur  L.  Bowley,  Jacques  Bertillon,  R.  H.  Hooker, 
Thomas  S.  Adams,  and  Warren  Persons  have  each  aided 
in  establishing  statistical  methods  in  the  field  of  economics. 

In  dealing  with  large  numbers  descriptive  of  groups  in 
any  field  it  is  found  that  special  methods  become  necessary ; 
methods  that  depend  on  the  peculiar  properties  of  large 
numbers ;  methods  that  are  suitable  for  describing  complex 
groups  so  they  can  be  easily  comprehended;  methods  for 
analyzing  the  accuracy  of  statements,  for  measuring  the 
significance  of  differences,  and  for  comparing  one  estimate 
with  another.  All  of  these  fall  within  the  scope  of  sta- 
tistics. Without  the  aid  of  statistical  methods,  we  simply 
have  large  numbers  and  groups  of  numbers  from  which 
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no  logical  deductions  can  be  drawn.  It  must  be  borne  in 
mind,  however,  that  statistical  methods  in  themselves  prove 
nothing.  The  methods  selected  for  use  in  a  particular 
situation  must  agree  with  the  logic  and  other  non-quan- 
titative facts  of  that  situation.  When  thus  used,  statistical 
methods  aid  us  in  refining  our  thinking  about  complex 
masses  of  data  and  also  in  refining  our  methods  of  expres- 
sion. Bowley  says :  ' '  The  proper  function  of  statistics,  in- 
deed, is  to  enlarge  individual  experience. ' '  ^  Because  sta- 
tistics measure  only  the  numerical  aspects  of  a  phenomenon 
they  should  be  brought  into  relation  to  the  personal, 
political,  aesthetic,  and  other  non-quantitative  considera- 
tions that  may  be  of  greater  importance  in  deciding  on  a 
course  of  action. 

Quantities  Measured  Indirectly. — Statisticians  must 
very  often  content  themselves  with  measuring,  not  the  facts 
they  wish,  but  some  allied  quantity,  since  it  is  frequently 
the  ease  that  the  quantity  about  which  knowledge  is  desired 
is  not  capable  of  numerical  measurements.  We  cannot 
measure  health,  crime,  or  poverty,  for  instance.  We  can 
measure  only  death-rate,  the  number  of  convictions,  or  the 
number  of  persons  who  receive  public  relief.  Many  facts 
in  other  fields  are  thus  measured. 

Few  important  actions  can  be  taken  by  a  modern  govern- 
ment or  even  a  modern  corporation  without  a  statistical 
study  of  the  conditions  of  the  field  in  question.  Statistical 
results  are  essential  when  judgments  are  to  be  formed  on 
any  question  which  involves  numbers,  quantities,  or  values ; 
but  they  should  always  be  used  with  discretion  and  care. 
"The  most  important  function  of  statistics,"  says  Bowley, 
"is  evidence  to  show  the  relation  of  one  group  of  phenomena 
to  another."*  The  information  obtained  is  presumedly 
intended  as  a  guide  for  action. 

'  An  Elementary  Manual  of  Statistics,  p.  8. 
*  Ibid.,  ip.l. 
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Definition  of  Statistics. — Statistics  has  been  defined  as 
the  science  of  averages.  They  render  the  meaning  of 
masses  of  figures  clear  and  comprehensible  at  a  glance. 
They  give  a  bird's-eye  view  of  a  situation  involving  a 
complex  series  with  numerous  cases  in  such  a  way  that 
we  get  a  picture  of  the  series  as  a  whole.  They  have  to 
do  with  movements  of  groups  as  a  whole.  They  refer  to 
a  large  mass  of  facts,  or  data,  that  bear  upon  some  human 
problem.  This  is  one  of  the  most  important  principles 
involved  in  statistical  methods.  The  individual  members 
of  a  group  change  rapidly  while  the  whole  group  changes 
slowly.  It  is  impossible,  for  instance,  to  follow  or  measure 
the  motions  of  the  separate  atoms  of  a  body,  but  com- 
paratively easy  to  measure  the  motion  and  state  the  laws 
governing  the  movements  of  a  body  as  a  whole.  When  we 
wish  to  obtain  a  measurement  of  a  group,  peculiarities  of 
individuals  receive  little  attention.  It  is  only  when  the 
same  peculiarities  are  possessed  by  a  considerable  number 
of  persons  or  things  that  they  become  of  importance  and 
are  taken  into  account. 

Statistics  are  numerical  statements  of  facts  in  any 
department  of  inquiry  placed  in  relation  to  each  other ;  sta- 
tistical methods  are  devices  for  abbreviating  and  classify- 
ing the  statements  and  making  clear  the  relations.  Statis- 
tics are  almost  always  comparative.  They  show  the 
relative  importance,  the  very  thing  an  individual  is  most 
likely  to  misjudge.  The  absolute  magnitude  of  a  quantity 
is  of  little  meaning  until  we  have  some  similar  quantity 
with  which  to  compare  it.  The  object  of  a  statistical 
estimate  of  a  complex  group  is  to  present  an  outline,  to 
enable  the  mind  to  comprehend  with  a  single  effort  the 
significance  of  the  whole. 

Statistics  deal  primarily  with  variable  quantities.  A' 
variable  is  a  quantity  that,  under  the  conditions  imposed, 
may  assume  different  values  throughout  a  discussion.     In 
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education,  where  we  make  numerous  single  measurements 
of  different  pupils  grouped  together  on  the  basis  of  some 
common  characteristic,  each  different  value  constitutes  a 
value  of  the  variable.^ 

Laws  of  Statistical  Regularity. — One  of  the  most  valu- 
able contributions  of  modern  scientific  statistics  is  that 
it  has  succeeded  in  giving  us  a  sufficient  picture  of  a 
group  of  objects  without  going  through  the  laborious  and 
expensive  process  of  a  complete  enumeration  of  all  the 
items  in  the  group.  Thus,  it  is  by  no  means  necessary, 
in  ascertaining  the  average  wage  of  American  M'orking- 
men,  to  obtain  data  regarding  each  man  at  work.  If 
certain  typical  instances  are  taken  and  properly  averaged, 
the  difference  of  this  average  from  the  true  average  wage 
of  all  workingmen  is  likely  to  be  such  a  small  quantity 
as  to  be,  for  all  practical  purposes,  negligible. 

In  a  similar  way  the  anthropologist  can  discover  the 
physical  characteristics  of  a  tribe  or  race  by  taking  care- 
ful measurements  of  only  a  small  minority  of  the  whole. 
This  is  due  to  a  law  of  nature  formulated  on  the  mathe- 
matical theory  of  probabilities,  that  a  moderately  large 
number  of  items  chosen  at  random  from  among  a  very 
large  group  are  almost  sure,  on  the  average,  to  have  the 
characteristics  of  the  larger  group.  Thus,  if  two  persons 
blindfolded  were  to  pick  here  and  there  300  ears  of  corn 
each  from  a  bin  containing  1,000,000  ears,  the  average 
length  of  the  ears  picked  by  each  person  would  be  almost 
identical  even  though  they  varied  considerably  in  length. 
It  must  not  be  inferred  from  the  above  that  any  number 
of  samples,  no  matter  how  large,  will  give  exactly  the 
same  results  as  would  be  obtained  by  the  use  of  the  entire 
mass  of  data.     The  probability  of  error  diminishes  con- 

^  B.  R.  Buckingham,  "Statistical  Terms  and  Methods,"  National 
Society  for  the  Study  of  Education,  Seventeenth  Yearbook,  Part  II, 
p.  115. 
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stantly  as  the  number  of  items  used  increases.  If,  then, 
only  a  few  sample  items  are  used,  the  chance  error  is 
likely  to  be  so  large  as  to  seriously  vitiate  the  results; 
but  as  the  number  of  samples  chosen  grows  larger,  the 
error  diminishes  until  it  eventually  becomes  negligible. 

Methods  of  Statistics,— The  quantitative  study  of  edu- 
cation reveals  two  principal  methods  of  treating  measure- 
ments of  human  traits  and  other  educational  data.  On 
the  one  hand,  the  observer  may  note  only  the  presence  or 
absence  of  an  attribute  or  trait.  For  example,  if  98  de- 
grees Fahrenheit  is  considered  a  normal  temperature  for 
an  adult  human  being,  and  if  an  individual  who  has  a 
temperature  higher  than  that  is  said  to  have  a  fever,  then 
an  observer  may  examine  individuals  to  see  whether  or 
not  this  attribute,  the  fever,  is  present.  The  quantitative 
character  in  this  case  arises  solely  in  the  counting  of  the 
number  of  individuals  who  possess  this  attribute.  The 
method  by  which  we  treat  statistics  collected  in  this  way 
has  been  defined  by  Yule  as  the  statistics  of  attributes. 

On  the  other  hand,  the  observer  may  want  to  know,  not 
only  as  to  the  presence  or  absence  of  the  attribute,  but 
how  much  of  the  attribute  is  present.  In  the  case  cited 
above,  he  may  want  to  know  how  much  fever  each  in- 
dividual has.  This  method  of  refining  statistics  and  meas- 
uring the  actual  magnitude  of  variable  attributes  is  known 
as  the  statistics  of  variables.  This  is  the  method  usually 
employed  in  educational  research.  For  example,  we  want 
to  know,  not  only  whether  or  not  there  is  retardation  in 
our  schools,  but  how  much  retardation ;  not  only  how  many 
pupils  made  grades  above  the  passing  marks,  but  how  much 
above  the  passing  marks ;  not  simply  that  there  is  elimina- 
tion from  school,  but  how  much  elimination;  and  so  on. 

This  method  implies  that  the  magnitude  of  the  attribute 
or  characteristic  has  been  measured  with  reference  to  some 
scale  made  up  of  units. 
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Limitations  of  Statistics. — Statistics,  while  extremely 
useful  to  the  investigator  in  almost  every  line  of  scientific 
inquiry,  have  limitations  and  shortcomings  which  cannot 
be  overcome.  Statistics  deal  largely  with  averages  and 
these  averages  may  be  made  up  of  individual  items 
radically  different  from  each  other.  In  the  average  these 
irregularities  are  swallowed  up.  But  statistics,  from  their 
very  nature,  cannot  and  never  will  be  able  to  take  into 
account  individual  cases.  The  difference  between  arith- 
metic and  statistics  is  that  the  former  attains  exactness 
while  the  latter  deals  with  estimates. 

Standard  of  Accuracy. — While  in  the  physical  sciences 
very  great  accuracy  of  measurements  is  practicable,  this 
is  far  from  being  true  in  the  ease  of  social  phenomena. 
In  this  field  a  multitude  of  sources  of  error  are  ever 
present,  many  of  which  can  be  eliminated  by  no  degree 
of  care.  Fortunately  for  the  statistician,  however,  small 
errors  are  often  negligible  and  in  no  way  obstruct  the 
solution  of  the  given  problem.  Attempts  to  attain  the 
greatest  possible  degree  of  accuracy  are  frequently  merelj^ 
waste  of  time.  It  might  be  possible  to  measure  the  cus- 
toms revenue  of  the  United  States  to  the  nearest  cent, 
for  instance,  but  for  ordinary  purposes  cf  statistical  com- 
parison, such  action  is  not  only  superfluous  but  positively 
confusing  to  the  mind,  in  as  much  as  the  addition  of  extra 
figures  directs  the  attention  from  the  fundamental  digits. 

Compensating-  vs.  Cumulative  Errors. — The  accuracy  of 
the  final  results  depends  very  largely  on  whether  the 
errors  are  compensating  or  accumulative.  If  different 
people  were  to  estimate  the  length  of  a  given  line  the 
chances  are  that  as  many  people  would  estimate  it  too 
long  as  too  short.  The  errors  in  measuring  a  line  made 
by  a  pair  of  chainmen,  because  of  stretching  the  chain 
too  tight  or  not  taking  up  the  slack  sufficiently,  would 
tend  ill  the  long  run  to  offset  one  another.     In  cases  of 
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this  kind  the  errors  are  said  to  be  compensating.  On  the 
other  hand,  if  the  chain  used  by  the  above-mentioned 
surveyors  were  too  short,  the  longer  the  line  measured  the 
greater  the  error  would  be.  The  last  case  mentioned  shows 
the  effects  of  accumulative  errors. 

Discrete  and  Continuous  Series. — Quantities  to  be 
measured  may  be  in  either  a  discrete  or  a  continuous  series. 
A  discrete  series  is  one  with  gaps.  It  is  made  up  of  a 
number  of  integers,  as  the  number  of  words  in  a  spelling 
test,  or  the  number  of  children  in  a  class.  There  are  either 
20  words  in  a  spelling  test,  or  21,  or  some  other  integral 
number.  There  are  never  20^  or  20f  words  in  the  test ; 
neither  are  there  19  ^  children  in  the  class.  In  each  case 
the  number  is  an  integer. 

A  continuous  series  is  one  that  does  not  contain  gaps 
and  is,  in  theory,  capable  of  any  degree  of  subdivision. 
Jlost  mental  traits  and  social  facts  belong  to  this  series. 
In  actual  measurements,  a  given  measure  of  a  continuous 
series  does  not  mean  a  single  point  on  the  scale,  but  a 
distance  along  the  scale  between  two  limits.  For  instance, 
when  we  say  that  an  athlete  runs  100  yards  in  10^  seconds, 
we  do  not  mean  that  it  was  exactly  IO/q  seconds  but  that 
it  was  at  least  10^  and  less  than  10^.  A  more  delicate 
recording  instrument  might  have  recorded  the  time  as 
^^Twu  seconds,  or  as  lOj^o^,  and  so  on,  depending  on  the 
delicacy  of  the  instrument. 

Undistributed  Measures. — The  fact  that  many  of  our 
marks  and  measures  in  education  are  indefinite  and  undis- 
tributed leads  to  a  great  many  errors  in  educational  sta- 
tistics. A  few  illustrations  will  make  this  point  clear.  In 
a  continuous  series  the  measure  zero,  which  should  be  a 
definite  distance  on  the  scale,  a  measure  somewhere  be- 
tween two  limits,  is  quite  indefinite  and  very  confusing 
when  used  as  a  point  of  reference  in  statistics.  Unless  the 
statistician  defines  what  he  means  by  zero,  correct  reckon- 
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ing  in  reference  to  it  is  impossible.  Thus  zero  may  mean 
from  minus  0.5  to  plus  0.5  or  from  0  to  1  or  some  other 
measure  on  the  scale.  It,  many  times,  means  a  distance 
on  the  scale  from  a  point  above  just  not  any  of  the  thing 
to  be  measured  to  an  indefinite  distance  below.  If  ten  boys 
were  given  an  examination  in  arithmetic  and  the  test  con- 
sisted of  ten  problems,  a  boy  who  failed  to  solve  any  of 
the  problems  would  be  marked  zero,  and  unless  otherwise 
explained  this  mark  would  mean  anything  less  than  one 
to  an  unknown  lower  extreme  The  boy  who  solved  six 
problems  is  scored  six ;  but  a  score  of  six  unless  otherwise 
explained  means  as  much  as  six  and  less  than  seven.  These 
points  should  be  carefully  noted  to  prevent  confusion  in 
finding  medians  and  other  measures  of  central  tendency 
and  variation  in  subsequent  pages. 

Rules  for  Tabulating  Data. — The  first  thing  that  the 
statistical  investigator  must  decide  is  the  exact  nature  of 
the  problem  that  he  desires  to  solve.  The  first  essential 
is  to  make  the  problem  definite  and  clear-cut.  The  next 
problem  is  the  arrangement  of  the  data  in  a  frequency 
distribution.  At  first  thought  it  would  seem  to  be  one 
of  the  simplest  things  in  the  world  to  construct  a  frequency 
table  for  recording  data;  but  the  beginner  who  attempts 
to  tabulate  a  complex  group  of  figures  will  quickly  discover 
that  the  simplicity  of  the  operation  is  far  more  apparent 
than  real.  In  fact,  when  a  scientific  tabulation  has  once 
been  made,  it  is  often  found  that  a  large  share  of  the  work 
of  analysis  is  completed. 

In  beginning  a  tabulation  the  first  question  that 
arises  is  whether  to  put  the  figures  in  one  or  in  several 
tables.  A  single  table  has  the  merit  of  completeness,  and 
the  data  are  thus  brought  into  proximity.  The  table,  how- 
ever, if  too  large,  becomes  confusing  to  the  eye  and  there 
is  great  difficulty  in  following  the  lines  and  columns  at 
a  glance.     Each  table  should  be  a  unit.     Rarely  should 
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one  attempt  to  demonstrate  in  the  same  table  several  com- 
parisons of  different  natures.  Another  matter  to  decide  is 
whether  the  table  shall  show  absolute  figures,  or  per- 
centages, or  both.  The  number  of  separate  headings,  or 
columns,  is  a  third  query  which  must  be  answered.  The 
more  minute  the  subdivisions,  the  greater  is  the  accuracy 
obtained.  On  the  other  hand,  the  multiplicity  of  headings 
prevents  the  proper  emphasis  being  given  to  the  main  facts 
and  tendencies  shown  by  the  statistics. 

General  Directions  for  Making  a  Scale  and  Curve-Plot- 
ting-.— Scales  and  distribution  tables  are  necessary  in 
statistics  for  two  reasons : 

1.  The  object  of  the  scale  may  be  to  present  graphically 
a  vivid  picture  of  the  general  distribution  of  the  facts 
relative  to  a  given  problem.  One  of  the  most  common  ways 
of  representing  these  facts  so  that  the  eye  will  catch  their 
general  trend  at  a  glance  is  to  plot  the  curve.  This  is  done 
in  the  following  manner :  A  horizontal  straight  line  is  first 
drawn,  and  points  are  located  at  equal  distances  on  this 
line.  At  the  left  end  of  the  line  a  perpendicular  is  erected, 
and  points  are  laid  off  in  a  similar  manner  on  this  line. 
The  two  series  of  points  are  called  the  scales.  It  is  usual 
to  call  the  point  where  the  perpendicular  line  intersects 
the  horizontal  line  the  origin  of  coordinate,  which  is 
designated  by  0.  The  horizontal  line  is  usually  designated 
by  OX  and  is  called  the  X-axis.  The  vertical  line  is 
designated  by  OY  and  is  called  the  Y-axis.  Distances 
along  the  X-axis  are  spoken  of  as  X-distances  or  X-coordi- 
nates,  and  distances  along  the  Y-axis  as  Y-distances  or 
Y-coordinates.  In  making  a  distribution  table  and  plot- 
ting curves,  lines  may  be  drawn  parallel  to  the  X-axis 
through  the  points  located  on  the  Y-axis,  and  lines  may 
be  similarly  drawn  through  the  points  on  the  X-axis 
parallel  to  the  Y-axis,  or,  the  curve  may  be  plotted  with- 
out drawing  these  additional  lines.     The  curve  is  more 
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easily  analyzed,  however,  if  these  additional  lines  are 
drawn. 

The  procedure  in  plotting  a  curve  may  be  illustrated  as 
follows : 

Lay  off  on  the  X-axis  distances  equal  to  the  magnitude 
of  the  part  or  trait  measured,  and,  at  the  respective  dis- 
tances representing  each  magnitude,  erect  perpendiculars 
to  the  X-axis.  Similarly  plot  the  corresponding  traits  on 
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Figure  VI.     plotting  of  a  curve 


the  Y-axis  and  erect  perpendiculars.  The  intersections 
of  the  perpendiculars  thus  erected  constitute  the  desired 
curve. 

2.  The  second  reason  for  making  scales  and  distribution 
tables  is  to  arrange  the  measures  so  as  to  facilitate  compu- 
tation. 

Designating  the  Class  Intervals, — Different  methods  are 
used  in  designating  the  class  intervals.    In  some  instances 
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the  class  interval  is  designated  by  its  mid-point.  If  the 
heights  of  individuals  are  under  consideration,  for  instance, 
and  the  steps  are  60,  62,  64,  66,  68,  etc.,  60  may  mean 
from  59  to  61,  and  62  may  mean  from  61  to  63,  etc.  In 
the  Courtis  arithmetic  tests  the  size  of  the  step  is  1  prob- 
lem. It  is  not  designated  by  the  middle  of  the  step,  but 
by  its  lower  limit.  For  example,  6  problems  does  not 
mean  from  5.5  problems  to  6.5  problems,  but  means  from 
6  to  7  problems,  and  7  problems  means  from  7  to  8  prob- 
lems, etc.  That  is,  a  pupil  would  get  credit  for  only  7 
problems  even  though  the  eighth  one  were  almost  com- 
pleted. 

When  the  limits  of  the  steps  are  not  clearly  designated, 
the  computation  is  difficult  to  follow.  It  is,  therefore,  a 
good  policy  for  the  beginner  to  state  clearly  what  his 
class  intervals  are  before  he  begins  the  computation. 

In  any  class  interval  there  may  be  many  measures.  The 
measures  are  spoken  of  as  the  frequencies  of  the  class  inter- 
vals, and  the  total  frequency  is  the  sum  of  all  the  class 
frequencies. 

Analysis  of  Results. — The  results  disclosed  by  a  dis- 
tribution table  are  seldom  fully  revealed  at  a  glance.  Much 
is  therefore  added  to  the  value  of  a  table  if  it  is  accom- 
panied by  a  written  analysis  which  points  out  the  principal 
conclusions  which  may  be  deduced  therefrom,  the  possible 
errors  involved,  and  the  probable  causes  of  the  phenomena. 
The  power  to  analyze  a  table,  interpret  the  results  cor- 
rectly, and  state  the  conclusions  lucidly  and  succinctly  is 
one  of  the  characteristics  indispensable  in  a  good  statis- 
tician. 

In  studying  things  of  the  same  variety  the  work  may 
usually  be  facilitated  by  dividing  the  items  into  classes. 
The  simplest  mode  of  classification  is  to  group  all  the  in- 
stances under  two  headings,  the  determining  factors  being 
whether  they  do,  or  do  not,  possess  a  given  characteristic. 
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Thus  we  may  classify  people  as  sane  or  insane,  workmen 
as  employed  or  idle,  flowers  as  white  or  colored,  men  as 
short  or  tall,  and  so  on. 

For  some  purposes  this  division  by  dichotomy,  or  cutting 
in  two,  may  be  most  satisfactory  but  in  many  cases  the 
difficulty  arises  that  there  is  no  distinct  dividing  line. 
Thus,  it  is  impossible  to  say  at  just  what  point  a  man 
ceases  to  be  short  and  becomes  tall.  It  is  therefore  neces- 
sary to  lay  off  arbitrarily  a  line  of  demarcation  between 
the  two  classes.  But  if  classes  are  to  be  thus  arbitrarily 
established,  it  is  often  much  more  advantageous  to  set  up 
a  large  number  of  them  rather  than  only  two.  In  practice 
this  is  usually  done  by  dividing  the  whole  group  into 
classes  of  equal  width.  Thus,  if  the  tallest  trees  in  a  group 
are  39  feet  and  the  shortest  16,  and  it  is  desired  to  divide 
the  entire  group  into  five  classes,  the  boundary  lines  would 
preferably  be  fixed  on  the  round  numbers  15,  20,  25,  30, 
35,  and  40.  These  boundary  lines  are  known  as  class 
limits,  and  the  distance  between  the  two  limits  of  any 
class  is  designed  as  a  class  interval.  In  the  case  cited 
above,  5  would  be  the  class  interval.  A  table  formed  by 
thus  dividing  the  group  into  a  number  of  smaller,  more 
homogeneous  classes,  and  indicating  the  number  of  items 
found  in  each  class  is  known  as  a  frequency  table.  The 
number  of  items  falling  within  the  given  class  constitutes 
the  size  of  that  class  or  the  frequency. 

The  Need  for  Understanding-  Statistical  Formulas, — 
The  real  scientist  must  know  his  tools.  He  owes  it  to  the 
science  and  to  the  persons  who  may  accept  his  results  to 
be  quite  familiar  with  his  tools.  The  blind  application 
of  formulas  in  statistics  has  been  encouraged  by  the  con- 
venient manuals  that  are  available  and  by  the  fact  that 
the  theory  has  been  surrounded  by  intricate  and  involved 
mathematics  so  that  the  non-mathematical  student  had 
great  difficulty  in  interpreting  them.    There  is  little  doubt 
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that  the  failure  of  many  books  on  statistical  methods  to 
set  forth  the  fundamental  principles  involved  in  the  treat- 
ment of  statistical  data  has  done  much  to  hinder  the  prog- 
ress in  the  use  of  statistics  in  education.  The  necessary 
mathematics  is  largely  elementary  arithmetic,  and,  with 
a  few  exceptions,  there  is  no  need  for  higher  mathematics. 
Special  effort  has  been  made  in  this  volume  to  present 
the  fundamental  principles  of  statistics  simply  and,  as  far 
as  possible,  in  non-mathematical  language.  Practically  all 
higher  mathematics  has  been  eliminated. 


CHAPTER  X 

THE   MEASUREMENTS   OF    CENTRAL   TENDENCY   OR   AVERAGES 

The  previous  chapters,  dealing  with  the  general  nature 
and  use  of  statistics,  the  methods  of  organizing  materials 
in  the  form  of  frequency  distributions,  and  the  definitions 
and  illustrations  of  statistical  terms,  have  prepared  us 
to  take  up  the  various  methods  of  statistically  treating 
the  distributions  thus  organized.  It  will  soon  become  clear 
that  the  organization  of  material  into  frequency  distri- 
butions is  but  a  preliminary  step  to  a  more  concise  descrip- 
tion and  representation  of  it  by  analyzing  the  size,  number, 
and  position  of  the  various  items  that  make  up  the  dis- 
tribution. 

Many  things  about  the  nature  of  the  distribution  must 
be  known  before  comparisons  with  other  distributions  may 
be  scientifically  made.  The  limitations  of  the  various 
averages  must  be  fully  recognized.  The  amount  of  dis- 
persion about  the  averages  must  be  known,  and  the  reli- 
ability of  the  measures  determined. 

It  has  been  indicated  in  a  previous  chapter  that  we  are 
primarily  interested  in  comparative  values  rather  than  in 
absolute  values.  In  fact,  a  thing  cannot  be  evaluated  until 
it  is  compared  with  something  else.  Statistical  data  can- 
not be  compared  until  they  are  expressed  in  terms  that 
properly  represent  the  entire  mass.  There  are  three  prin- 
cipal ways  of  describing  statistical  data  in  the  form  of 
frequency  distributions,  that  is,  there  are  three  general 
types  of  measurements  of  statistical  data.  They  are:  (1) 
measurements    of    central    tendency,    or    averages;    (2) 
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measurements  of  dispersion  or  variability;  and  (3) 
measurements  of  correlation.  The  first  of  these  will  be 
discussed  in  the  present  chapter  and  the  other  two  in  the 
two  succeeding  chapters. 

Averages. — The  word  average  has  a  very  indefinite 
meaning  in  common  parlance.  The  public  uses  the  term 
very  loosely.  Generally  speaking,  the  word  average  as 
used  by  the  public  may  mean  one  of  two  things:  It  may 
mean  the  most  frequent  measure  in  the  group,  ''the  gen- 
eral run,"  the  typical  measure,  as  the  average  clerk,  the 
average  teacher,  the  average  size  city,  the  average  Amer- 
ican; or,  it  may  mean  a  different  thing  altogether,  illus- 
trated by  what  the  farmer  has  in  mind  when  he  says  that 
his  hogs  averaged  210  pounds,  or  his  wheat  averaged  14.7 
bushels  to  the  acre. 

In  the  second  interpretation  of  the  average,  it  may  be 
that  not  a  single  hog  in  the  drove  actually  weighted  210 
pounds,  nor  a  single  acre  actually  yielded  14.7  bushels  of 
wheat ;  hence  this  average  is  very  different  from  the  other 
which  is  the  typical  measure  in  the  series. 

The  first  average  illustrated  above  is  more  specifically 
known  as  the  mode,  and  the  latter  as  the  arithmetic  mean, 
or  simply  the  mean.  A  discussion  of  the  characteristics 
and  limitations  of  these  averages  and  also  those  of  another 
average  called  the  median  will  be  taken  up  at  some  length. 

The  Arithmetic  Mean. — The  arithmetic  mean  may  be 
defined  as  the  sum  of  all  the  measures  in  a  distribution 
divided  by  the  numier  of  measures.  It  is  represented 
by  the  formula 

^  fm 


M  = 


N 


where  M  represents  the  arithmetic  mean  of  the  distribu- 
tion, S  indicates  that  the  products  of  fm  are  to  be  added, 
m  =the  value  of  any  measure,  /  =  the  number,  or  fre- 
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quency,  of  the  measure  of  a  given  value,  and  N  the  total 
number  of  measures.  Table  XVII  illustrates  the  simple 
computation  of  the  arithmetic  mean. 

Table  XVII. — Grades  in  Per   Cent  Made   by  Ten  Students  in 

First-Year  Algebra 

(Hypothetical  Case) 


Pupil 

Grade 

Frequency 

FrequencyX  Measures 

m 

/ 

fm 

A 

84 

84 

B 

91 

91 

C 

68 

68 

D 

79 

79 

E 

95 

95 

F 

93 

93 

G 

87 

87 

H 

85 

85 

I 

90 

90 

J 

92 

92 
10)864     =S/m 
86 . 4  =  the  mean 

In  the  treatment  of  statistical  data  the  distributions  are 
rarely  so  simple  as  in  the  above  illustration.  Instead  of 
having  one  student  make  a  grade  of  84,  another  91,  and 
so  on,  it  would  be  more  probable  to  have  several  students 
making  grades  of  84,  91,  etc.,  ranging  all  the  way  from 
about  60  per  cent  up  to  100  per  cent. 

Table  XVIII  is  more  nearly  representative  of  the  way 
data  are  found  and  treated  statistically. 

Table  XVIII  illustrates  a  distribution  where  each  score 
is  made  by  more  than  one  pupil;  that  is,  a  score  of  4  is 
made  by  three  pupils,  a  score  of  5  is  made  by  four  pupils, 
a  score  of  6  by  21  pupils  and  so  on.  Since  there  is  more 
than  one  person  making  each  of  the  various  scores,  the 
distribution  is  said  to  be  weighted,  and  the  arithmetic 
mean  found  from  such  a  distribution  is  called  a  weighted 
arithmetic  mean. 
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Table  XVIII. — Distribution  of  Scores  on  614  Samples  of  Pen- 
manship Made  by  Children  in  the  Third  Grade  in  the  Salt 
Lake  City  Public  Schools  ^ 


Score 

Frequency 

Score  X  Frequency 

m 

/ 

Jm 

4 

3 

12 

5 

4 

20 

6 

21 

126 

7 

55 

385 

8 

85 

680 

9 

196 

1,764 

10 

46 

460 

11 

102 

1,122 

12 

44 

528 

13 

39 

507 

14 

11 

154 

15 

4 

60 

16 

4 

64 

614)5,882 

9 .  58  mean 

It  should  be  noted  that  the  true  mean  is  found  in  this 
case  the  same  as  in  Table  XVII.  The  principles  under- 
lying the  computation  of  the  simple  and  weighted  means 
are  the  same.  In  each  case  the  value  of  each  measure,  or 
score,  is  multiplied  by  its  frequency;  the  products  are 
added ;  and  their  sum  is  divided  by  the  number  of  measures. 

The  formula  for  the  weighted  arithmetic  mean  is 


M  = 


N 


where  M  equals  the  arithmetic  mean,  m  the  numerical  value 
of  any  measure,  /  the  corresponding  frequency  of 
occurrence,  S  the  sum  of  fm's  and  N  the  total  number  of 
measures. 


^  E.  P.  Cubberley,  School  Organization  and  Administration,  p.  154. 
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In  Tables  XVII  and  XVIII,  the  data  are  said  to  be 
ungrouped  and  the  exact  value  of  each  score  is  recorded; 
that  is,  there  were  three  pupils  who  made  a  score  of  4 
(Table  XVIII),  four  who  made  a  score  of  5,  and  so  on. 
It  so  happens  that  the  range  on  the  Thorndike  Handwrit- 
ing Scale  is  from  Quality  4  to  Quality  18  inclusive,  making 
a  total  range  of  15.  If,  however,  the  range  were  extended, 
or  the  samples  graded  on  a  percentage  basis,  the  grades 
might  range  from  0  per  cent  to  100  per  cent.  In  that  case 
the  distribution  might  have  100  different  scores  instead 
of  13,  as  shown  in  Table  XVIII. 

Making  a  distribution  table  with  from  80  to  100  different 
scores  in  it  would  involve  a  great  amount  of  labor.  In 
order  to  facilitate  matters  and  lessen  the  volume  of  labor, 
the  data  are  grouped  into  class  intervals.  The  number 
of  class  intervals  should  rarely  exceed  20  and  a  less  num- 
ber is,  many  times,  desirable.  The  only  reason  for  group- 
ing the  data  into  class  intervals  is  to  lessen  the  labor  in 
computing  the  arithmetic  mean.  By  this  method,  how- 
ever, accuracy  is  sacrificed  somewhat  to  save  labor;  but 
the  difference  between  the  true  arithmetic  mean  with  the 
data  ungrouped  and  the  mean  found  by  this  method  (that 
is,  with  the  data  grouped)  is  so  small  that  it  is  generally 
negligible. 

In  computing  the  arithmetic  mean  two  assumptions  must 
be  made  when  the  data  are  grouped  in  class  intervals :  (1) 
that  the  measures  are  distributed  uniformly  throughout 
the  class  interval;  and  (2)  that  for  purposes  of  computa- 
tion the  measures  in  any  class  interval  may  be  numerically 
represented  by  the  mid-point  of  the  class  interval. 

Two  methods  may  be  employed  in  the  solution  of  the 
arithmetic  mean  with  grouped  data.  Table  XIX  repre- 
sents the  data  in  Table  XVIII  grouped  in  class  intervals 
and  computed  by  the  traditional,  or  long  method. 
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Table  XIX. — Distribution  of  Scores  on  614  Samples  of  Penman- 
ship Made  by  Children  in  the  Third  Grade  in  the  Salt 
Lake  City  Public  Schools  to  Illustrate  the  Computation 
OF  the  Mean  with  Data  Grouped  in  Class  Intervals 


Class 
Intervals 

Mid-point 

of 

Class  Intervals 

m 

Frequency 
/ 

MeasuresX  Their  Correspond- 
ing Frequency 

fm 

16-17.99 

14^15.99 

12-13.99 

10-11.99 

8r-  9.99 

6-  7.99 

4-  5.99 

17 

15 

13 

11 

9 

7 

5 

4 
15 

83 
148 
281 

76 

7 

68 

225 

1,079 

1,628 

2,529 

532 

35 

614 

614)6,096 

9 .  92  weighted  arithmetic 
mean 

The  true  arithmetic  mean,  as  shown  in  Table  XVIII,  is 
9.58,  while  the  arithmetic  mean  computed  with  grouped 
data  is  9.92,  making  a  difference  of  0.34  of  a  score. 

Computation  of  Arithmetic  Mean  by  Short  Method. — 
Table  XX  illustrates  the  computation  of  the  arithmetic 
mean  by  the  short  method;  the  data  are  taken  from  Table 
XVIII. 

The  usual  method  in  arranging  the  frequency  distribu- 
tions is  to  begin  with  the  highest  scores  and  work  in  the 
direction  of  the  lower  ones,  that  is,  in  Table  XX,  we  place 
the  scores  from  16  to  17.99  at  the  top  of  the  distribution. 
This  is  not  absolutely  necessary,  but  it  is  more  convenient 
and  less  liable  to  errors. 

When  the  data  are  grouped  in  the  class  intervals,  as 
illustrated  in  Table  XX,  we  may  then  take  the  mid-point 
of  any  class  interval  as  the  assumed  mean.  It  is  best  to 
take  the  class  interval  that  contains  the  true  mean  although 
this  is  not  necessary.  The  point  9,  the  mid-point  of  the 
interval  8  to  9.99,  was  chosen  as  the  assumed  mean.    The 
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Table  XX 


Class 
Interval 

Frequency 
/ 

Deviation  from 

the  Assumed 

Mean  Interval 

d 

Frequency X 
Deviation 

fd 

16-17.99 

14-15.99 

12-13.99 

10-11.99 

8-  9.99 

6-  7.99 

4-  5.99 

4 

15 

83 

148 

281 
76 

7 

+4 
+3 
+2 
+  1 
0 
-1 
-2 

16 

45 

166 

148 

375 

-76 

-14  -  90 

614 

+285 

285^614  =  0.46 
0 .  46  X  2  =  0 .  92  =  c,  the  correction 


Assumed  mean. ..  .   9.0 
Correction 0 .  92 

True  mean 9 .  92 


mid-point  of  the  class  interval  immediately  above  is  11 
and  is  said  to  have  a  deviation  {d)  from  the  assumed 
mean  of  -\- 1.  The  second  class  interval  above  has  a 
deviation  of  +  2  and  so  on.  The  mid-point  of  the  class 
interval  immediately  below  the  assumed  mean  has  a  devia- 
tion of  —  1 ;  the  second  one  a  deviation  of  —  2,  and  so 
on.  We  next  multiply  the  deviations  (d)  by  the  frequency 
(/)  just  as  we  did  in  the  long  method.  This  gives  us 
the  fd  column  in  the  table.  It  is  evident  that,  if  the 
assumed  mean  occupied  the  same  position  as  the  true  mean, 
the  sum  of  the  deviations  above  it  would  equal  the  sum 
of  the  deviation  below,  but  since  the  true  and  the  assumed 
means  are  not  the  same,  the  measures  above  will  not  equal 
those  below.  Of  course,  it  is  possible  that  the  assumed 
mean  might  equal  the  true  mean,  in  which  case  there  would 
be  no  correction ;  but  this  would  rarely  happen.  In  most 
cases,  therefore,  a  correction  (c)  must  be  added  to  the 
assumed  mean  to  get  the  true  mean.     This  correction  is 
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the  algebraic  sum  of  the  fd's  divided  by  the  total  number 
of  cases  in  the  distribution.    It  is  expressed  by  the  formula 

Xfd 

in  which  c  equals  the  correction  to  be  added,  2  fd  equals 
the  algebraic  sum  of  the  frequencies  (/)  multiplied  by  their 
respective  deviations  {d),  and  N  is  the  number  of  cases  in 
the  entire  distribution. 

The  average  amount  of  deviations  from  the  assumed 
mean,  taken  in  the  right  direction  (that  is,  added  alge- 
braically) evidently  would  give  us  the  true  mean.  It  is 
evident  that,  if  the  sum  of  the  plus  fd'^  is  greater  than  the 
sum  of  the  minus  fd's,  the  true  mean  deviates  from  the 
assumed  mean  in  the  direction  of  the  positive  /(£'s,  or  the 
true  mean  is  greater  than  the  assumed  mean  by  an  amount 
equal  to  the  correction  c.  If,  however,  the  sum  of  the 
negative  /cZ's  is  greater  than  the  sum  of  the  positive  /<^'s, 
then  the  correction  must  be  subtracted,  and  the  true  mean 
is  less  than  the  assumed  mean. 

In  Table  XX  the  positive  /cZ's  exceed  the  negative  fd's 
by  285.  This  divided  by  the  total  number  of  cases  (614) 
gives  0.46  of  a  class  interval  that  each  of  the  measures  is 
in  error,  when  the  mean  is  considered  to  be  at  9,  the  mid- 
point of  the  class  interval.  Since  the  width  of  the  class 
interval  is  2,  we  multiply  the  average  error  0.46  by  2, 
giving  0.92  of  an  actual  unit  that  must  be  added  to  the 
assumed  mean.  Thus  we  have :  9  +  0.92  —  9.92  the  true 
mean. 

It  should  be  noted  that  the  short  method  is  short  only 
v/hen  the  range  is  long  and,  therefore,  many  class  intervals 
are  necessary.  If  the  range  is  short,  there  is  nothing 
gained  by  using  the  short  method.  Following  is  the  sum- 
mary of  steps  used  in  this  method: 
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SuMMAET  OP  Steps  in  the  Computation  of  the  Arithmetic 
Mean  By  the  Short  Method 

1.  Group  the  measures  in  a  frequency  distribution  table.  Ar- 
range the  data  in  the  table  in  four  columns.  The  first  column 
at  the  left  contains  the  class  intervals  arranged  with  the  highest 
scores  at  the  top  of  the  table;  the  second  column  contains  the 
frequencies  (/) ;  the  third  column  contains  the  deviations  from 
the  assumed  mean  (d) ;  the  fourth  column  contains  the  products 
of  the  frequencies  times  the  deviations  {fd^s). 

2.  Find  the  total  of  the  frequencies  in  the  second  column. 

3.  By  inspection  estimate  the  class  interval  that  contains  the 
mean  and  take  as  the  assumed  mean  the  mid-point  of  this  interval. 
(The  mid-point  of  any  class  interval  may  be  taken  as  the  assumed 
mean;  but  it  is  better  to  choose  the  class  interval  containing  the 
true  mean  or  one  near  it.) 

4.  Consider  each  class  inten'al  os  a  unit  and  record  in  the  third 
column  the  number  of  units  that  the  mid-point  of  each  class 
interval  deviates  from  the  assumed  mean;  the  first  one  above 
the  assumed  mean  having  a  deviation  of  +  1;  the  second  one  a 
deviation  of  -\-  2,  and  so  on.  Deviations  below  the  assumed  mean 
are  treated  in  the  same  way,  but  considered  negatively. 

5.  Multiply  each  deviation  (d)  by  its  corresponding  frequency 
(/■)  observing  the  algebraic  signs,  and  record  the  product  in 
column  4. 

6.  Find  the  algebraic  sum  of  the  fd's  in  column  4.  ] 

7.  Divide  this  sum  (which  is  the  difference  between  the  positive 
and  negative  fd's)  by  the  total  number  of  measures  {N)  which 
is  the  sum  of  the  frequencies  in  column  2.  This  gives  the  arith- 
metic mean  of  the  deviations  from  the  assumed  mean  in  terms 
of  class  intervals. 

8.  Multiply  the  deviation  in  terms  of  class  intervals  by  the 
number  of  units  in  the  class  interval. 

9.  Add  (algebraically)  the  product  obtained  in  8  to  the  as- 
sumed mean  to  get  the  true  mean. 

The  advantages  and  disadvantages  of  the  different  forms 
of  average  will  be  taken  up  after  the  7node  and  median 
have  been  discussed. 

The  Mode. — The  mode  is  that  item  or  term  that  is  most 
characteristic  or  frequent  in  a  distribution.    It  is  the  value 
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which  is  the  fashion  {la  mode).  It  represents  the  typical 
fact.  In  Table  XVII,  page  280,  there  is  no  mode  because 
one  item  or  grade  occurs  as  often  as  another.  In  Table 
XVIII,  score  9  represents  the  mode  because  that  score 
occurs  more  frequently  than  any  other. 

A  mode  may  he  defined  as  that  measure  of  a  variable 
fact  which  appears  more  frequently  than  measures  directly 
above  or  below  it.  Distributions  may,  therefore,  be 
unimodal  or  multimodal.  A  symmetrical  distribution  is 
unimodal  because  there  is  only  one  place  in  the  distribu- 
tion where  the  measures  are  of  greater  frequency  than 
those  directly  above  or  below  it. 

The  value  of  the  mode  as  a  measure  of  central  tendency 
over  the  average  may  be  illustrated  by  the  following 
example:  Suppose  one  were  told  that  the  average  wealth 
of  ten  farmers  living  on  a  certain  highway  was  over  $100,- 
000  each.  It  might  so  happen  that  one  of  these  farmers 
was  worth  $1,000,000  and  the  other  nine  were  worth  $1,000 
each.  The  average  or  mean  used  in  this  case  is  misleading. 
The  mode  would  be  a  much  better  average  to  use.  In  this 
case  the  mode  would  be  $1,000,  which  represents  the  group 
better  than  the  mean  which  would  be  more  than  $100,000. 
The  mode  has  little  statistical  value  other  than  an  in- 
spectional  average  and  will,  therefore,  be  discussed  very 
briefly. 

The  Median, — The  fact  that  the  median  has  not  been 
rigorously  defined,  or,  if  thus  defined,  the  definition  has 
not  been  generally  accepted,  has  led  to  considerable  con- 
fusion in  its  computation. 

Rugg  says  that  the  median  is  ''defined  rigorously  as 
that  point  on  the  scale  of  the  frequency  distribution  on 
each  side  of  which  one  half  of  the  measures  f aU. ' '  ^  The 
measures,  of  course,  must  be  arranged  according  to  their 

2  Harold  O.  Rugg,  Statistical  Methods  Applied  to  Education,  p.  104. 


288  EDUCATIONAL  MEASUREMENT 

ascending  or  descending  values.  While  Rugg  calls  this  a 
rigorous  definition,  it,  nevertheless,  has  admitted  of  many 
solutions  because  of  different  interpretations  of  the  limits 
of  class  intervals  and  the  lack  of  uniformity  in  the  dis- 
tribution of  the  cases  over  them.  While  theoretically  there 
would  be  no  class  intervals  near  the  center  of  the  dis- 
tribution that  contains  no  measures,  yet  actually  this  hap- 
pens many  times  in  practice  when  the  number  of  measures 
in  the  distribution  is  small. 

In  practice  we  have  not  been  rigorously  consistent  in 
the  scoring  of  test  papers  and  the  computation  of  the 
median  from  the  scores  thus  obtained.  Teachers  and 
students  have  had  considerable  trouble  in  finding  medians, 
because  in  actual  practice  a  half  dozen  methods  are  being 
employed,  no  two  of  which  will  give  the  same  result.  In 
order  to  help  clarify  the  procedure  and  point  out  the  basic 
principles  and  assumptions  made,  we  shall  discuss  the  com- 
putation of  the  median  somewhat  in  detail.  Before  doing 
this,  however,  we  shall  give  some  additional  definitions 
of  the  median  in  order  to  have  it  clearly  in  mind  as  the 
discussion  proceeds. 

Thorndike  defines  the  median  thus:^  "The  median,  or 
50  percentile  or  mid-measure  is  the  place  on  the  scale 
reached  by  counting  half  the  measures,  in  the  order  of 
their  magnitude,  or  the  place  on  the  scale  above  and  below 
which  are  equal  numbers  of  the  measures."  This  defini- 
tion and  the  one  given  by  Rugg  make  no  provision  for  the 
computation  of  the  median  when  the  number  of  measures 
in  the  distribution  is  even  and  there  is  no  middle  measure. 
As  a  consequence,  statisticians  have  used  a  variety  of 
methods  in  computing  medians  with  distributions  of  this 
kind. 

Secrist  defines  the  median  thus:*     "The  median  of  a 


8  Edward  L.  Thorndike,  Mental  and  Social  Measurements,  pp.  36-37. 
■•Horace  Secrist,  An  Introduction  to  Statistical  Methods,  p.  238. 
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series  is  that  item — actual  or  estimated — in  a  series,  when 
arranged  consecutively,  which  divides  the  distribution  into 
equal  parts.  When  the  number  of  items  is  even,  it  is 
half  way  between  the  two  middle  terms ;  when  the  number 
is  odd,  it  is  the  middle  term. ' '  This  definition  is  the  same 
as  that  used  by  McCall  who  defines  the  median  thus:^ 
"When  measures  are  arranged  in  order  of  size,  the  median 
is  the  middle  measure  or  (lacking  a  middle  measure)  mid- 
way between  the  two  middlemost  measures."  A  median 
thus  defined  is  perfectly  definite  and  admits  of  but  one 
interpretation  provided  there  is  uniformity  in  the  treat- 
ment of  the  measures  in  the  class  intervals.  We  shall  now 
note  wherein  the  methods  have  differed  in  the  computation 
of  the  median. 

1.  The  Spread  of  the  Score  Interval  Commonly  Used  in 
Statistics  and  School  Practice. — When  a  pupil  takes  a  test 
in  addition  in  arithmetic,  for  instance,  what  are  the  limits 
of  the  various  scores?  That  is,  does  a  score  of  2  mean 
that  the  pupil  has  done  a  definite  amount  of  work,  just 
barely  finished  two  problems,  and  no  more,  or  does  it  mean 
something  else?  What  do  five  problems  mean?  The  cus- 
tom is  to  give  no  credit  unless  the  pupil  completes  at  least 
one  problem.  If  he  does  less  then  one  problem,  he  is  given 
a  grade  of  zero.  Therefore,  zero  means,  in  actual  practice, 
anything  between  just  not  any  of  the  thing  in  question 
and  1.  One  problem  means  any  amount  between  1  and  2, 
but  not  exactly  1  and  no  more.  Five  problems  means 
any  amount  between  5  and  6,  and  so  on.  We  thus  use  the 
lower  limit  of  the  step,  or  score  interval,  in  recording 
grades  in  arithmetic  and  in  most  of  the  other  subjects  in 
the  curriculum.  In  finding  the  median  score  in  most  of 
the  school  achievement  tests,  however,  the  custom  has  been 
to  use  the  middle  of  ^the  step  in  computing  the  median  and 

5  William  A.  McCall,  "How  to  Compute  the  Median,"  Teachers 
College  Record,  Vol.  21,  March,  1920,  p.  126. 
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the  lower  limit  of  the  score  interval  in  scoring  the  papers 
and  recording  the  scores. 

From  the  standpoint  of  statistics,  the  middle  of  the  score 
interval  is  the  best  measure  to  use.  But  the  score  interval 
generally  used  in  statistics  is  not  the  same  as  the  one  used 
in  actual  practice  in  scoring  test  papers.  From  1  to  2, 
2  to  3,  etc.,  are  the  score  intervals  used  in  scoring  papers. 
But  in  the  computation  of  the  median  the  common  practice 
is  to  employ  score  intervals  with  their  limits  at  0.5,  1.5, 
2.5,  etc.  Statistical  treatment  is  not  consistent  and  does 
not  conform  to  the  practice  in  at  least  three  respects:  (1) 
Statistical  methods  use  the  mid-point  of  the  score  interval 
in  computing  the  median  but  use  the  lower  limit  of  the 
score  interval  in  scoring  the  papers.  (2)  In  statistics  one 
score  interval  is  used  in  computing  the  median,  and  an- 
other is  used  in  scoring  the  papers  and  recording  the 
grades.  (3)  In  statistics  the  practice  has  been  to  use  a 
different  score  interval  in  computing  medians  from  that 
used  by  teachers  in  scoring  test  papers  and  recording  the 
grades.  In  the  following  pages  we  shall  note  how  statistical 
methods  may  be  made  more  consistent  and  at  the  same  time 
conform  to  the  methods  used  by  teachers  in  scoring  papers. 

2.  What  Formula  Shall  We  Use  in  Computing  a 
Median  f — Some  authors  recommend  the  use  of  the  formula 
N  +1 


2 


in  finding  the  median,  and  others  recommend  the 


formula — .   Both  formulas  will  not  give  the  same  result 

in  all  cases.    The  question  then  arises  as  to  which  formula 
should  be  used,  and  if  both  are  to  be  employed,  when  shall 
the  former  be  used,  and  when  shall  we  use  the  latter? 
It  seems  to  the  author  that  there  is  no  very  good  reason 

lor  usmg  the  formula  '-  since  the  same  point  on  the 
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scale  is  not  reached  in  computing  the  median  if  we  employ 
this  formula  and  count  in  from  each  end  of  the  series. 
Furthermore,  if  we  make  the  score  intervals   consistent 

N  -\-l 

with  our  logic  in  scoring  papers,  then  the  formula 

2 
cannot  be  employed  at  all.     It  is  sometimes  argued  that 
if  you  want  the  ordinal  number  of  the  median  you  should 

N  -{-1 

use  the  formula but  if  you  want  the  median  point 

2 

A'' 
on  the  scale,  the  formula  — should  be  used.    In  answer  to 

2 
this  argument  it  may  be  shown  that  both  the  ordinal  num- 
ber and  the  median  point  on  the  scale  may  be  found  by 
the  latter  formula.     We  shall  now  illustrate  the  various 
eases  in  the  computation  of  the  median. 

Computation   of  the   Median,    Simple   Distribution. — 
Case  1.    The  number  of  items  in  the  distrihution  is  odd. 


Table  XXI. — Scores  Made  by  Thirteen  Sixth-Grade  Pupile 
THE  Courtis  Arithmetic  Tests,  Series  B 

i   IN 

Pupa 

c 

M 

L 

A 

D 

B 

F 

J 

H 

I 

K 

e 

G 

Score 

16 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

Since  the  median  is  the  measure  in  the  middle  score 
interval,  it  may  be  located  by  dividing  the  score  intervals 
by  2.  (The  score  interval  here  is  the  same  as  the  class 
interval  since  there  is  only  one  score  in  a  class  interval.) 

—  =  —  =  6.5.     In  our  reasoning  let  us  make  the  score 
2       2 

intervals  conform  to  practice  in  scoring  papers  and  assume 

that  they  are  from  4  to  5 ;  5  to  6,  etc.     Starting  with  the 

score  interval  4-5  and  counting  6.5  score  intervals  locates 

the  median  at  10.5.     Counting  down  from  the  other  end 

of  the  series  we  reach  the  same  point.     This  is  both  the 
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mid-point  of  the  middle  measure  (a  measure  is  assumed 
to  be  distributed  over  a  score  interval)  and  the  mid-point 
on  the  scale.  The  middle  score  is  the  score  made  by  pupil 
F.  Its  mid-point  is  halfway  between  10  and  11  or  10.5. 
Therefore  10.5  is  the  median.  The  customary  way  of 
computing  the  median  is  to  take  the  score  intervals  one- 
half  interval  lower  than  these  and  make  10  the  median 
score  instead  of  10.5.  This,  however,  is  inconsistent  with 
the  method  of  scoring  the  papers. 

Case  2.  The  nuniher  of  items  or  scores  is  even. — Let  us 
eliminate  pupil  G  with  a  score  of  4  from  the  preceding 
distribution  and  compute  the  median  with  an  even  number 
of  cases.    Table  XXI  now  becomes  Table  XXII. 


Table  XXII. — Scores  Made  by  Twelve  Sixth-Grade  Pupils  in 
THE  Courtis  Arithmetic  Tests,  Series  B 


Pupil 

c 

M 

^ 

A 

D 

B 

F 

J 

9 

H 

8 

I 
7 

K 

6 

E 

Score 

16 

15 

14 

13 

12 

11 

10 

5 

It  is  evident  that  the  median  score  cannot  be  a  middle 

score  in  this  series  since  there  is  no  middle  score.    Apply- 

TV"     12 
ing  the  formula  used  in  Case  1  we  have  —  =  —  =  6.  Since 

2       2 

there  is  no  middle  score,  some  provision  must  be  made  to 
satisfy  a  case  of  this  kind.  Since  there  are  12  score  inter- 
vals, it  is  evident  that  if  we  took  the  junction  point  of 
the  6th  and  7th  score  interval  we  would  have  the  mid- 
point on  the  scale.  Starting  with  the  5-6  interval  and 
counting  in  six  intervals  we  have  11  as  the  mid-point  on 
the  scale.  This  value  is  obtained  by  counting  in  from 
either  end  of  the  distribution.  The  usual  method  of  rep- 
resenting a  score  interval  in  statistics  is  by  its  mid-point. 
The  mid-point  of  the  10-11  score  interval  is  10.5  and  the 
mid-point  of  the  11-12  score  interval  is  11.5.     If  we  take 
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half  of  these  two  middlemost  values,  we  get  11  as  the 
median  score  which  conforms  to  the  definition  given  by 
McCall  and  Secrist.  This  method  is  logical  and  conforms 
to  the  way  scores  are  computed  from  test  papers.  If,  as 
has  been  the  custom,  the  score  intervals  are  taken  at  6.5 
to  7.5 ;  7.5  to  8.5 ;  etc.,  then  the  median  score  would  be  10.5 
instead  of  11.  This  method,  however,  is  inconsistent,  as 
was  pointed  out  above. 

When  the  Distribution  Is  Complex. — Case  3.  Where 
more  tlian  one  pupil  make  the  same  score  and  the  data 
are  grouped  in  class  intervals. — The  distribution  of  edu- 
cational data  is  rarely  so  simple  as  those  given  in  Tables 
XXI  and  XXII.  Instead  of  a  score  being  made  by  just 
one  pupil  it  is  usually  made  by  more  than  one,  sometimes 
by  hundreds  of  pupils.  This  necessitates  grouping  the 
data  in  class  intervals  in  order  to  condense  the  distribu- 
tion table  within  workable  limits.  It  also  involves  the 
distribution  of  the  items  within  the  class  intervals. 

The  question  of  discrete  and  continuous  series  discussed 
in  Chapter  IX  should  be  reviewed  in  order  to  have  clearly 
in  mind  the  type  of  data  under  consideration.  Most 
measurements  in  education  belong  to  a  continuous  series 
or  may  be  treated  as  continuous  even  though  discrete. 

Having  decided  the  question  as  to  whether  the  data  are 
discrete  or  continuous,  the  next  important  question  to  de- 
cide is  the  distribution  of  the  items  in  the  various  class 
intervals.  Suppose  we  had  a  group  of  50  children  who 
made  a  score  of  7  in  arithmetic  on  the  Courtis  arithmetic 
test.  The  chances  are  that  some  of  the  pupils  had  the 
eighth  problem  almost  finished,  when  the  signal  to  stop 
was  given.  Others  had  it  three-fourths  finished,  some  had 
it  half  finished,  some  one-fourth  finished,  and  a  few  had 
just  barely  finished  the  seventh  problem  when  time  was 
called.  In  other  words,  instead  of  the  50  pupils  just  barely 
finishing  the  seventh  problem  the  instant  that  time  was 
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called,  the  probabilities  are  that  they  were  distributed 
about  equally  over  the  interval  7  to  7.9999  -f-  Instead 
of  having  50  pupils  make  a  score  of  7,  let  us  simplify  the 
problem  and  take  a  distribution  where  four  pupils  make 
a  score  of  7  and  the  median  falls  within  the  7-8  interval. 


Table  XXIII 

Pupil 

c 

M 

L 

A 

D 

B 

F 

J 

H 

I 

K 

E 

G 

Score 

9 

9 

9 

8 

8 

7 

7 

7 

7 

6 

5 

5 

4 

The  number  of  scores  is  13.  The  median  score  is  the 
seventh  score  counting  in  from  either  end  of  the  series. 
The  score  made  by  pupil  f  is  therefore  the  median  score. 
But  there  are  four  pupils  who  made  a  score  of  7.  The 
best  guess  we  can  make  as  to  the  way  those  four  scores 
are  distributed  in  the  class  interval  7-7.9999  +  is  to  say 
that  they  are  distributed  equally  over  the  class  interval. 

The  following  illustration  will  make  it  clear  what  is 
meant  by  being  distributed  equally  over  the  class  interval. 
The  table  shows  that  pupil  g  had  solved  four  problems; 
pupil  E,  five;  pupil  k,  five;  pupil  i,  six;  pupils  h,  j,  f, 
and  B,  seven,  and  so  on,  when  the  examiner  gave  the  signal 
to  stop.  It  does  not  mean,  however,  that  just  the  instant 
the  signal  to  stop  was  given  that  pupil  G  had  just  barely 
finished  four  problems  and  had  not  started  on  the  fifth 
one,  or  that  pupils  e  and  k  had  just  barely  finished  five 
problems  and  had  done  no  work  on  the  sixth,  and  so  on. 
These  scores  indicate  the  number  of  problems  actually  com- 
pleted and  each  of  the  13  pupils  may  or  may  not  have 
attempted  more  problems  than  those  actually  completed. 
Since  the  interval  7  to  7.9999  +  is  the  interval  that  con- 
tains the  median,  the  problem  is  to  find  out  what  is  the 
most  probable  amount  of  work  done  by  the  seventh  pupil 
counting  in  from  either  end  of  the  series.    Let  us  represent 
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graphically  the  7-8  interval.  We  know  that  four  pupils 
completed  seven  problems ;  but  we  do  not  know  how  much 
more  they  did.  Now,  our  best  guess  would  be  that  the 
actual  amount  of  work  done  by  these  four  pupils  is  dis- 
tributed somewhat  as  follows: 

Let  us  represent  the  class  interval  7.00  to  7.9999  -| —  8 
by  Figure  VII  and  let  the  line  AB  represent  the  distance 
through  this  step.  When  the  signal  to  stop  was  given, 
the  first  of  these  four  students  (student 
h)  had  completed  7  problems  and  had 
done  work  on  the  eighth  one  ranging  in 
amount  somewhere  between  0  per  cent 
and  25  per  cent;  J,  the  second  student, 
had  the  eighth  problem  from  25  per 
cent  to  50  per  cent  completed;  f,  the 
third  pupil,  had  completed  from  50  per 
cent  to  75  per  cent  of  the  eighth  prob- 
lem, and  B,  the  fourth  pupil,  had  com- 
pleted from  75  per  cent  to  100  per 
cent.  This  would  be  a  safer  guess  than 
to  guess  that  all  four  of  these  students  had  just  barely 
completed  the  seven  problems  when  the  examiner  gave  the 
signal  to  stop.  Still  another  guess  is  necessary.  Assum- 
ing that  we  know  that  the  first  pupil  in  the  7-8  interval 
had  done  work  on  the  eighth  problem  ranging  in  amount 
somewhere  between  0  per  cent  and  25  per  cent,  and  that 
we  wanted  to  make  the  best  estimate  we  could  make,  tak- 
ing one  case  with  another,  as  to  how  much  he  actually 
did,  we  would  estimate  that  he  had  gone  halfway  through 
this  interval  0  per  cent  to  25  per  cent,  or,  that  he  had 
done  12.5  per  cent  of  the  eighth  problem  when  the  signal 
to  stop  was  given.  Reasoning  the  same  way  for  the  other 
score  intervals,  the  second  pupil  (j)  would  have  had  37.5 
per  cent  of  the  eighth  problem  completed;  the  third  (f) 
62.5  per  cent,  and  the  fourth  one  (b)  87.5  per  cent. 


Figure  VII 
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Now,  since  we  have  the  most  reasonable  distribution  of 
these  eases  over  the  class  interval,  we  are  ready  to  con- 
tinue the  work  in  computing  the  median.  Starting  at  the 
lower  end  of  the  distribution  we  have  four  cases  up  to  the 
7-8  interval.  Since  the  median  score  is  the  score  made 
by  the  seventh  pupil,  and  since  there  are  13  score  intervals 
in  the  entire  series,  therefore,  counting  in  from  either  end 

of  the  series  —  score  interval,  we  would  have  half  the 

2 

distance  through  the  series  and  also  have  the  mid-point 
of  the  seventh  score  interval.  "We  have  four  cases  up  to  the 
7-8  class  interval  and  must  have  2.5  of  the  4  score  intervals 
in  the  7-8  class  interval.  Noting  Figure  VII,  we  see  that 
2.5  score  intervals  would  give  us  a  point  62.5  per  cent 
through  the  7-8  class  interval,  or  a  median  point  on  the 
scale  of  7.625.  But  7.625  is  also  the  mid-point  of  the 
seventh  score  interval.    Therefore  the  median  is  7.625. 

Table  XXIV  illustrates  the  computation  of  the  median 
with  the  data  grouped  in  class  intervals,  each  of  which 
contains  five  units  instead  of  one,  and  with  some  of  the 
class  intervals  containing  a  large  number  of  cases. 

Table  XXIV. — Distribution  of  Marks  Given  in  English  to  263 
High-School  Pupils 
Class  Interval  Number  of  Pupils 

95.0-100.00 20 

90.0-  94.99 63    (83)  Adding  down 

85.0-  89.99 38  (121) 

80.0-  84.99 47 

75.0-  79.99 38    (95) 

70.0-  74.99 33    (57) 

65.0-  69.99 16    (24) 

60.0-  64.99 2      (8) 

55.0-  59.99 3      (6) 

50.0-  54.99 1      (3) 

45.0-49.99 1      (2)  Adding  up 

40.0-44.99 1 

AT  =  263 
Applying  formula,  -7y-~7r  — 131.5. 
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Since  there  are  263  eases,  131.5  score  intervals  from 
either  end  of  the  series  will  be  the  mid-point  on  the  scale 
and  also  the  mid-point  of  the  middlemost  measure. 

Adding  up  from  the  bottom,  we  have  95  cases  up  to  the 
80-84.99  class  interval.  This  class  interval  contains  47 
cases,  or  47  score  intervals,  and  we  must  take  36.5  of  these 
score  intervals  to  give  us  the  median  point  on  the  scale, 
and  the  mid-point  of  the  middlemost  measure.  The  same 
result  is  obtained  by  counting  down  from  the  top.  There 
are  121  scores  down  to  the  upper  margin  of  the  class 
interval  80.0-84.99.     Therefore,  we  must  take  10.5  scores 

10  5 

from  the  47  in  that  interval  or  go  -j^  the  distance  through 

the  class  interval  coming  in  from  the  top.    Since  the  num- 

10  5 

ber  of  units  in  the  interval  is  five,  —ri=-  of  5  subtracted 

'    47 

from  85,  the  upper  limit  of  the  interval,  equals  83.88,  which 

is  the  median. 

Case  4.  Where  the  median  falls  in  the  100  or  the  zero 

class  interval. 

Table  XXV. — Distribution  op  the  Marks  Given  to  a  Freshman 
Class  in  Algebra 

Class  Interval  Frequency 

100 13 

90-99.99 1 

80-89.99 2 

70-79.99 1 

60-69.99 1 

50-59.99 2 

40-49.99 1 

30-39.99 1 

20-29.99 1 

23 

N     23 
Substituting  in  the  formula,  — -=  —  =  11.5.      Counting 

2       2 

from  either  end  of  the  series  we  find  that  the  median  lies 
in  the  100  interval.     The  cases  in  this  interval  are  undis- 
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tributed  and  are  considered  to  lie  at  a  point;  therefore 
there  is  no  correction,  and  the  median  is  100. 

In  the  solution  of  problems  no  credit  is  usually  given 
unless  the  student  solves  one  or  more  problems  correctly. 
In  a  class  of  25,  it  might  be  that  13  pupils  would  fail  to 
complete  any  problems.  In  this  case  the  median  would 
depend  upon  whether  the  cases  in  the  zero-interval,  that  is, 
from  0  to  0.999  +,  were  distributed  equally  over  the  inter- 
val or  whether  they  were  considered  to  be  piled  up  at  its 
lower  limit.  From  the  reasoning  in  the  former  cases  it 
would  be  more  logical  to  consider  them  distributed  over 

13 

the  interval  and  take  of  the  distance  through  the 

JL2.D 

interval  as  the  median  measure.    Therefore,  the  median  is 
0.96. 

Case  5.  When  the  partial  sum  is  the  half  sum  and  there 
is  no  correction. — Another  case  that  proves  troublesome 
is  illustrated  in  Table  XXVI,  taken  from  Monroe.^ 

Table  XXVI 

Scale  Frequency 

15 

14 

13 1 

12 1 

11 2 

10 3 

9 5 

8 4 

7 6 

6 3 

5 

4 

Total 25 

Approximate  median 9.0 

Correction 0.0 

True  median 9.0 


'  Measuring  the  Resulls  of  Teaching,  p.  106. 
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In  explaining  the  median  in  this  case,  Monroe  says: 
*'Case  A  is  where  the  partial  sum  (13)  is  also  the  half 
sum.  The  approximate  median  is  in  the  next  interval  (9). 
Since  the  difference  between  the  partial  sum  and  the  half 
sum  is  zero,  there  is  no  correction  and  the  true  median 
is  9.0." 

This  practice  does  not  conform  to  the  theory  discussed 
above.  Neither  is  it  consistent  with  the  instructions  that 
Monroe  gives  for  finding  the  median  in  his  reading 
tests. 

Reasoning  as  we  did  from  Table  XXIII,  we  note  that 
since  the  interval  8-9  contains  four  measures  which  are 
theoretically  distributed  equally  over  the  interval,  the 
thirteenth  measure  would  lie  in  the  class  interval  some- 
where between  8.75  and  8.9999  and  our  best  guess  is  that 
it  lies  at  the  mid-point  of  that  interval  or  at  0.875  of  the 
distance  through  the  class  interval,  which  would  make  the 
median  8.875  instead  of  9. 

Case  6.  Where  measures  are  discrete. — Table  XXVII 
illustrates  the  computation  of  the  median  with  measures 
discrete. 

Table  XXVII. — Illustrating  the  Computation  of  the  Median, 

Series  Discrete 

Number  of  Pupils  Frequency 

in  Class 

10 14 

11 17 

12 21 

13 27 

14 20 

15 18 

Total 117 

Since  there  are  117  measures  the  59th  measure  is  the 
median  measure.  Counting  up  from  the  bottom  we  find 
that  the  59th  class  lies  somewhere  among  the  classes  con- 
taining 13  pupils.     Now  since  it  is,  of  course,  impossible 
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for  a  class  to  contain  a  fraction  of  a  pupil,  the  median 
is  13. 

Case  7.    Where  the  median  falls  within  a  class  interval 
containing  no  cases. 

Table  XXVIII 

Scale  Frequency 

11 8 

10 10 

9 20 

8 0 

7 10 

6 13 

5 10 

4 5 

Total 76 

—  equals  38.     There  being  an  equal  number  of  cases, 

the  median  is  mid-point  between  the  two  middle  cases. 
But  there  is  one  class  interval  between  the  two  middle- 
most cases  that  contains  no  measures.  Since  the  number 
of  cases  is  even,  the  median  point  on  the  scale  would 
ordinarily  be  the  junction  point  of  the  two  middlemost 
score  intervals,  but,  since  the  two  middlemost  score  inter- 
vals are  not  contiguous,  we  add  half  the  intervening  class 
interval  to  the  score  interval  above  and  the  other  half 
to  the  score  interval  below  and  locate  the  median  at  the 
mid-point  of  the  gap  between  the  two  middlemost  score 
intervals.  Halfway  through  the  8-9  interval,  therefore,  or 
8.5,  is  the  median. 

A  somewhat  lengthy  discussion  has  been  given  on  the 
computation  of  the  median,  because  the  median  is  one  of 
the  chief  measures  of  central  tendency,  and  also  because 
the  methods  are  not  uniform.  We  shall  now  give  a  sum- 
mary statement  of  the  steps  for  the  computation  of  thej 
median. 


MEASUREMENTS  OF  AVERAGES  301 


Summary  op  Steps  in  the  Computation  of  the  Median 

1.  Arrange  the  data  in  a  frequency  distribution  taking  special 
care  to  note  the  limits  of  the  class  intervals  and  also  the  limits 
of  score  intervals. 

2.  Find  one-half  the  sum  of  the  measures. 

3.  Beginning  at  either  end  of  the  distribution,  preferably  the 
lower  end,  count  the  number  of  measures  included  in  all  class 
intervals  up  to  the  interval  containing  the  median. 

4.  Subtract  this  number  from  the  half  sum  of  all  the  measures 
computed  in  step  2.  The  difference  is  the  number  of  measures 
that  must  be  taken  from  the  next  interval  to  bring  the  com- 
putation up  to  the  median  point  on  the  scale. 

5.  Divide  this  remainder  by  the  number  of  cases  in  the  class 
interval  containing  the  median  and  multiply  the  quotient  by  the 
number  of  units  in  the  class  interval, 

6.  Add  this  number  to  the  value  of  the  lower  limit  of  the 
class  interval  containing  the  median,  if  computation  is  made  from 
the  lower  end  of  the  distribution,  and  subtract  it  from  the  upper 
limit  of  this  class  interval,  if  computation  was  made  from  the 
upper  end  of  the  distribution.  This  is  the  median  point  on  the 
scale. 

If,  in  finding  the  median,  no  other  measures  are  desired, 
care  need  be  taken  only  in  the  arrangement  of  the  cases 
near  the  median  value.  Consequently  the  median  cannot 
give  detailed  information  of  the  measures  at  the  extremities 
of  the  ranges.  On  the  other  hand,  the  median  is  a  fairly 
stable  measure  and  changes  very  slowly  when  different 
samples  are  taken,  which  means  that  it  is  not  greatly 
affected  by  the  presence  of  accidental  and  irrelevant  in- 
fluences. 

Comparison  of  the  Arithmetic  Mean,  Mode,  and  Median. 
— ^We  are  now  in  a  position  to  compare  the  three  kinds  of 
averages  with  a  view  of  determining  their  more  salient 
characteristics.  Following  is  the  comparison  of  the  arith- 
metic mean,  the  mode,  and  the  median: 
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Arithmetic  Mean 

1.  The  arithmetic 
mean  takes  into 
consideration  all 
cases  and  is  affected 
by  their  size. 

2.  The  arithmetic 
mean  is  aiJected  by 
every  item  in  the 
group. 


3.  The  mean  may  be 
found  without  ar- 
ranging the  meas- 
ures according  to 
their  magnitude. 

4.  The  mean  may  be 
determined  when 
the  aggregate  and 
the  number  of  cases 
are  known. 

5.  The  mean  may  fall 
where  no  data  ac- 
tually exist. 


Mode 

1.  The  mode  deals 
with  only  the  most 
representative 
measures  and  neg- 
lects the  extreme 
cases. 

2.  The  mode  is  deter- 
mined by  the  most 
frequent  measures 
only. 


3.  The  measures  must 
be  arranged  accord- 
ing to  their  magni- 
tude. 

4.  The  mode  may  be 
located  without  the 
number  of  cases  or 
the  extreme  cases. 


5.  The  mode  falls 
where  the  cases  are 
most  numerous. 


Median 

1.  The  median  takes 
into  consideration 
all  the  cases;  but 
the  size  of  extreme 
cases  does  not  af- 
fect it. 

2.  The  median  is  a 
counting  average 
and  is  affected  by 
the  number  of  cases, 
but  not  by  the  size 
of  the  extreme 
cases. 

3.  The  measures  must 
be  arranged  accord- 
iag  to  their  magni- 
tude. 

4.  The  sum  of  the 
measures  and  their 
number  do  not  fur- 
nish sufficient  data 
to  compute  the  me- 
dian. 

5.  The  median,  like 
the  mean,  may  be 
interpolated,  and 
fall  where  no  case 
actually  exists. 


A  comparison  of  the  above  averages  indicates  that  the 
nature  of  the  data  and  the  problem  to  be  solved  must  deter- 
mine the  average  to  be  used.  If  the  size  of  the  measures 
and  the  number  of  cases  are  to  be  taken  into  consideration, 
then  the  arithmetic  mean  is  the  average  to  use.  If,  how- 
ever, the  most  characteristic  measure  of  the  group  is 
wanted,  the  mode  best  satisfies  this  condition.  The  arith- 
metic mean  has  the  advantage  of  being  a  common  measure 
and  one  with  which  the  public  is  familiar.  Its  calculation 
is  simple,  but  it  is  greatly  effected  by  extreme  cases  and 
for  that  reason  it  should  many  times  give  way  to  the 
median  or  mode.  One  disadvantage  of  the  mode  is  the  fact 
that  there  is  many  times  no  well-defined  type,  and  one 
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measure  appears  as  often  as  another.     In  this  ease  the 
median  is  probably  the  most  representative  term. 

Quartiles  and  Percentiles. — It  is  sometimes  convenient 
to  divide  the  distribution  into  divisions  smaller  than  those 
made  by  the  median,  that  is,  to  divide  it  into  quarters, 
tenths,  etc.  The  medians  dividing  the  halves  of  the  dis- 
tribution into  equal  parts  are  known  as  quartiles.  Start- 
ing v^ith  the  lower  end  of  the  distribution,  the  median 
dividing  the  lower  half  is  known  as  the  first  quartile, 
(Q-l),  whereas  the  median  dividing  the  upper  half  of  the 
distribution  is  known  as  the  third  quartile,  {Q3).  The  com- 
putation of  the  quartiles  is  the  same  as  the  median  except 

that  in  the  first  quartile  we  take  the  —  case  and  in  the 

third  quartile  the  —  case  instead  of  the  —  case,  as  in  the 

median. 

In  the  same  way  we  may  find  any  desired  percentile 
in  the  distribution.  If  the  distribution  is  divided  into 
ten  equal  parts,  the  division  points  are  known  as  deciles. 
A  series  of  such  measures  gives  a  more  complete  picture  of 
the  distribution  than  can  be  obtained  from  a  single 
measure. 


CHAPTER  XI 

MEASUREMENTS  OP   DISPERSION,   OR   VARIABILITY 

In  the  previous  chapter  we  noted  the  measures  of  central 
tendencies.  We  shall  now  note  the  dispersion  or  variation 
from  these  measures.  Measures  of  variation  call  special 
attention  to  the  degree  of  homogeneity  which  characterizes 
the  distribution.  The  simplest  measure  of  dispersion  is 
the  range,  that  is,  the  difference  between  the  greatest  and 
least  magnitude  in  the  series.  While  this  is  a  simple  measure 
of  dispersion,  it  is  a  very  imperfect  one,  as  will  be  shown 
later,  because  two  distributions  might  have  the  same  range 
and  yet  differ  widely  in  their  "  scatteration."  A  few  illus- 
trations wiU  make  clear  the  point  why  measures  of  varia- 
bility are  necessary  in  describing  a  group  of  data. 

Suppose  a  teacher  who  contemplated  going  to  another 
state  to  teach  school  was  told  that  the  mean  salary  of 
teachers  in  that  state  was  $1,250.  It  is  evident  that  the 
mean  does  not  convey  sufficient  information  to  make  one 
intelligent  as  to  what  his  chances  are  of  getting  the  mean 
salary.  It  might  be  that  nine-tenths  of  the  teachers  received 
salaries  between  $1,200  and  $1,600,  in  which  case  one's 
chance  of  getting  a  salary  pretty  close  to  the  average  would 
be  good.  Or  it  might  be  that  no  teacher  got  a  salary  within 
$200  of  the  average.  The  measure  of  central  tendency  does 
not,  therefore,  give  one  anything  like  exact  information  as 
to  what  to  expect. 

When  one  says  that  the  average  grade  or  the  median  grade 
of  a  freshman  class  in  algebra  is  75  per  cent,  it  does  not 
indicate  the  distribution  of  the  grades  in  that  class.     They 
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may  range  from  5  per  cent  to  100  per  cent,  or  from  65  per 
cent  to  80  per  cent,  or  cover  any  other  range  from  0  per  cent 
to  100  per  cent.  The  description  of  the  distribution  is  far 
more  nearly  complete  when  the  dispersion  or  ''  scatteration  " 
is  given. 

At  best  the  average  is  but  a  partial  measure  of  type.  If 
one  desired  to  compare  the  salaries  of  teachers  in  two  states 
or  the  grades  in  two  schools  and  had  nothing  but  the 
measures  of  central  tendency,  it  would  be  impossible  to 
draw  any  definite  conclusions,  because  the  forms  of  the 
distributions  might  vary  greatly.  Two  distributions  might 
have  the  same  range  and  the  same  mean,  median,  and  mode 
and  still  differ  widely  as  to  form.  In  one  distribution  the 
cases  might  be  concentrated  near  the  measure  of  central 
tendency  while  in  the  other  they  might  be  distributed 
about  equally  over  the  entire  range.  Or,  again,  the  meas- 
ures in  one  distribution  might  be  concentrated  at  the  ends 
of  the  range  while  in  another  distribution  they  might  be 
concentrated  at  or  near  the  center. 

How  Variability  Is  Measured. — We  noted  in  the  previ- 
ous chapter  that  a  measure  of  central  tendency  was  a 
'position,  or  point  on  the  scale.  Variability  differs  from  a 
measure  of  central  tendency  in  that  the  latter  is  a  point  or 
a  position  on  the  scale  whereas  the  former  is  a  distance. 
Variability  is  expressed  as  the  distance  on  the  scale  that 
will  include  a  certain  proportion  of  the  measures  in  the  dis- 
tribution. This  distance  is  expressed  in  various  units, 
depending  in  part  on  the  nature  of  the  distribution  and  in 
part  on  the  arbitrary  choice  of  the  statistician. 

Measures  of  Absolute  Variability. — There  are  four  meas- 
ures of  absolute  variability  in  common  use.     They  are : 

1.  The  range,  which  includes  aU  of  the  measures  in  the 
distribution. 

2.  The  mean  deviation  is  the  mean  of  all  the  deviations 
from  a  measure  of  central  tendency,  such  as  the  median  or 
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mean.     When  laid  off  on  each  side  of  the  average  in  a  nor- 
mal distribution,  it  includes  the  middle  half  of  the  cases. 

3.  The  standard  deviation  is  the  square  root  of  the  mean 
of  the  squares  of  all  deviations  when  the  deviations  are 
measured  from  a  measure  of  central  tendency,  either  the 
mean  or  the  median.  When  thus  laid  off,  it  includes  approxi- 
mately the  middle  two-thirds  of  the  distribution. 

4.  The  quartile  deviation,  median  deviation.  The  quar- 
tile  or  median  deviation  applies  to  that  portion  of  the 
distribution  contained  between  the  first  and  third  quartiles 
and  is  computed  by  taking  one-half  the  range  contained  in 
the  middle  half  of  the  distribution.  It  may  be  computed 
from  the  formula 


where  Qs  represents  the   third  quartile  and   Qi   the  first 
quartile. 

Another  term  frequently  used  is  the  'probable  error  discussed 
in  a  previous  chapter.  If  the  distribution  is  symmetrical 
then  the  probable  error  equals  the  median  deviation  and 
includes  the  middle  half  of  the  cases  in  the  distribution.  If 
the  distribution  is  not  symmetrical,  but  skewed,  it  is  ques- 
tionable whether  the  term  should  be  used.  The  term  really 
belongs  in  sampling.  Sampling  is  used  under  the  following 
conditions:  In  statistics  it  is  usually  impossible,  or  at  least 
not  convenient,  to  obtain  all  the  measures  of  any  group  of 
things  under  consideration,  so  we  take  samples  and  judge 
the  entire  group  by  the  samples  taken.  When  a  farmer 
brings  his  wheat  to  town,  the  man  at  the  elevator  does  not 
examine  critically  the  entire  load  but  takes  a  sample  from 
the  front  end  of  the  load,  one  from  the  rear,  and  perhaps 
one  from  the  middle,  and  judges  the  entire  load  from  the 
samples  taken.  In  a  city  school  system  a  superintendent 
may  desire  to  know  what  score  the  fourth-grade  children 
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are  able  to  make  on  a  certain  test.  It  may  be  that  he  does 
not  have  time  to  test  the  fourth  grade  throughout  the  entu'e 
city.  He  therefore  chooses  a  few  schools  at  random  and 
judges  the  entire  foui'th  grade  by  the  samples  taken.  But 
to  be  scientific  he  must  know  how  reliable  his  samples  are. 
That  is,  if  he  had  taken  samples  from  other  schools,  would 
they  have  differed  radically  from  those  taken?  How  dif- 
ferent would  the  results  have  been  if  he  had  tested  the 
fom'th  grade  through  the  entire  city?  Probable  error  is  a 
means  of  testing  the  reliability  of  samples.  It  is  a  quantity 
such  that  we  would  obtain  values  of  greater  and  less  mag- 
nitude with  equal  frequency  if  more  cases  or  samples  were 
taken.  If  the  distribution  is  normal,  it  is  evident  that  there 
will  be  as  many  measures  greater  than  those  in  the  middle 
half  as  there  are  those  that  are  less.  Or,  it  is  a  "  fifty-fifty  " 
chance,  when  measures  are  under  consideration  not  included 
in  the  middle  half,  that  they  wiU  be  larger  than  those  in  the 
middle  half  as  often  as  they  are  smaller.  For  example, 
suppose  a  city  superintendent  desired  to  know  the  mean 
score  of  the  eighth-grade  children  on  the  Monroe  Silent 
Reading  Test  and  that  he  did  not  have  time  to  test  aU  the 
eighth-grade  children  throughout  the  entire  city  or  that  the 
expense  were  too  great.  He  might  then  test  ten  classes, 
for  instance,  and  determine  the  mean  grade  of  the  classes 
chosen.  Let  us  suppose  that  the  mean  score  was  25.  He 
would  next  want  to  know  how  near  this  score  of  25  would 
be  to  the  mean  score  if  he  had  tested  all  of  the  eighth-grade 
children  throughout  the  entire  city.  He  knows  that  as  he 
went  from  school  to  school  testing  the  eighth  grades,  the 
mean  score  of  some  of  them  was  more  than  25  and  for  others 
it  was  less.  In  making  his  calculations  let  us  suppose  that 
he  found  the  probable  error  to  be  6.  This  means  that  in 
taking  his  ten  samples  as  often  as  he  found  a  measure  19 
or  less  (6  below  25),  he  would  find  one  31  or  more  (6  more 
than  25).     It  is  evident  that  the  greater  the  difference  in 
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reading  ability  found  in  the  samples  taken,  the  greater  the 
probable  error.  In  other  words  a  small  P.E.  means  a 
rather  homogeneous  group.  It  is  also  evident  that  P.E. 
would  not  be  the  correct  measure  to  use  unless  the  distri- 
bution were  normal.  Since  the  quartile  deviation  is  deter- 
mined by  counting  the  number  of  measures  between  the 
first  and  third  quartiles,  and  taking  one-half  of  them,  it  is 
really  not  deviation  at  all. 

Computation  of  the  Mean  Deviation. — We  have  indi- 
cated above  that  the  quartile  deviation  is  not  a  deviation 
from  any  particular  average  and  takes  account  of  the  form 
of  the  distribution  only  in  an  indirect  manner.  The  mean 
deviation,  on  the  other  hand,  is  a  real  measure  of  deviation 
from  a  measure  of  central  tendency. 

The  illustration  in  Table  XXIX  will  make  clear  the 
significance  of  the  mean  deviation. 

Table  XXIX. — Grades  Made  by  Ten  Pupils  in  the  First- Year 

Algebra 


Pupils 

Grade 

Deviation  from  Mean 

A 

79 

6.6 

B 

91 

5.4 

C 

76 

9.6 

D 

86 

0.4 

E 

88 

2.4 

F 

90 

4.4 

G 

95 

9.4 

H 

94 

8.4 

I 

78 

7.6 

J 

79 

6.6 

10)856 

10)60.8 

85.6  = 

=  mean 

6 .  08  =  mean  deviation 

We  note  that  the  mean  grade  for  this  group  of  ten  pupils 
is  85.6.  No  pupil  makes  the  mean  grade.  The  grade 
made  by  a  differs  from  the  mean  6.6.  That  made  by  b 
differs  by  5.4  and  so  on.     The  simi  of  all  the  deviations 
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from  the  mean,  without  reference  to  signs  (that  is,  without 
reference  to  whether  they  are  above  the  mean  or  below  it), 
divided  by  the  number  of  cases,  which  is  10,  gives  the  average 
or  mean  deviation,  6.08. 

The  mean  deviation  may  be  computed  from  either  the 
mean  or  the  median.  It  would  be  the  same  from  either  if 
the  distribution  were  symmetrical.  If  only  sHghtly  unsym- 
metrical,  it  would  be  the  same  to  the  second  decimal  point. 
If  the  distribution  is  considerably  skewed,  however,  the 
deviation  is  less  from  the  median  because  the  arithmetic 
mean  is  affected  by  both  the  size  of  the  items  and  the  fre- 
quencies. In  the  case  of  the  median  only  the  frequencies 
and  the  size  of  the  items  near  the  center  of  the  distribution 
affect  this  measure.  The  following  illustration  from  Bowley 
will  make  clear  the  reason  why  deviations  from  the  median 
are  less  than  from  the  mean.^ 

Suppose  that  it  is  required  to  run  from  a  telephone  exchange 
separate  wires  to  every  one  of  A''  places  in  a  straight  line,  where 
should  the  exchange  be  placed  so  as  to  use  the  least  total  amount 
of  wire?  At  the  median  position.  For  if  you  move  from  the 
median  position  to  the  right,  or  to  the  left,  you  will  find  immediately 
that  you  are  adding  more  wire  than  you  are  subtracting.  Sup- 
posing there  are  20  stations,  and  you  have  a  position  between  the 
10th  and  11th;  if  you  move  to  a  position  between  the  Uth  and 
12th  you  have  to  increase  your  distance  from  ten  stations  and 
diminish  it  from  nine,  in  every  case  by  the  same  length  of  the  wire. 
The  wires  correspond  to  the  deviations;  and  the  sum  of  the 
lengths  of  the  wires  is  the  sum  of  the  lengths  of  the  deviations. 

It  is  evident  that  with  an  arithmetic  mean  there  may  be 
more  cases  on  one  side  of  the  mean  than  on  the  other;  there- 
fore, the  sum  of  the  distances  from  the  mean  to  the  various 
cases  would  be  greater  than  from  the  median.  From  a 
mathematical  standpoint  it  would  seem,  therefore,  that  the 
median  is  the  proper  measure  of  central  tendency  to  use  in 
computing  the  mean  deviation. 

1  A.  L.  Bowley,  Measurement  of  Groups  and  Series,  p.  30. 
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Computation  of  the  Mean  Deviation:  Data  Grouped  in 
a  Frequency  Distribution. — In  Table  XXIX  we  illustrated 
the  computation  of  the  mean  deviation  where  the  number 
of  cases  was  small  and  the  data  were  ungrouped.  We  shall 
now  illustrate  the  computation  with  data  grouped  in  a 
frequency  distribution.    Table  XXX  illustrates  the  method. 

Table  XXX. — Distkibution  op  Scores  Giv^n  to  288  High-School 
Pupils  in  Plane  Geometry,  Illustrating  the  Computation  of 
THE  Mean  Deviation  by  the  Long  Method 


Class 
Interval 

Mid-point 
of  Class 
Interval 

Frequency 
/ 

Deviation 
d 

Frequency 

X  Deviation 

fd 

95-100 
90-  94.99 
85-  89.99 
80-  84.99 
75-  79.99 
70-  74.99 
65-  69.99 
60-  64.99 
55-  59.99 
50-  54.99 
45-  49.99 
40-  44.99 

97.5 
92.5 
87.5 
82.5 
77.5 
72.5 
67.5 
62.5 
57.5 
52.5 
47.5 
42.5 

20 

62 

49 

27 

48 

23 

18 

21 

9 

6 

3 

2 

14.91 

9.91 

4.91 

0.09 

5.09 

10.09 

15.09 

20.09 

25.09 

30.09 

35.09 

40.09 

298.20 
614.42 
240.59 
2.43 
244.32 
232.07 
271.62 
421.89 
225.81 
180.54 
105.27 
80.18 

AT  =  288 

2,917.34 

1  =  144 


2,917.34 

288 


=  10. 13  ikf.D. 


The  true  niedian  is  82.59  (computation  not  shown  here). 

The  computation  of  the  mean  deviation  by  the  method 
given  in  Table  XXX  involves  a  great  amount  of  work  by 
reason  of  the  fact  that  the  exact  deviations  are  taken  from 
the  true  median.  These,  in  most  cases,  are  fractions  that 
must  be  multiplied  by  the  frequency.  Much  labor  may  be 
saved  by  assuming  that  the  median  is  at  the  mid-point  of 
the  class  interval  in  which  the  true  median  is  located  and  by 
reckoning  the  deviation  in  terms  of  class  intervals  instead 
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of  in  terms  of  the  units  of  class 
intervals.  We  may  then  make 
the  proper  correction  for  the 
assumptions  made.  That  is,  if 
the  deviations  are  reckoned  about 
an  assumed  median  instead  of  the 
true  median,  the  proper  correc- 
tions must  be  made  for  the  differ- 
ence between  the  assumed  and 
the  true  medians. 

It  will  be  found  that  the  sum 
of  the  deviations  about  the 
assumed  median  is  always  less 
than  those  about  the  true  median; 
hence  the  correction  must  always 
he  added. 

This  may  be  illustrated  by 
Figure  VIII.  Suppose  we  have 
a  distribution  the  range  of  which 
is  from  40  to  100  and  that  the 
range  is  divided  into  class  inter- 
vals of  five  units  each.  Let  us 
suppose  that  the  true  median  is 
82.59  as  in  Table  XXX.  The 
assumed  median  is  at  82.5.  Since 
the  true  median  is  82.59,  it  means 
that  there  are  as  many  pupils 
who  receive  grades  above  82.59 
as  there  are  below  it.  But  we 
make   an   assumption   that    the 

j      median   is   located  at  82.5   and 
that    all   the   measures    in    the 

1      interval  80-85   lie  at   the  mid- 

i      point     of    this     class     interval. 

I      Therefore    the    27   cases    would 


Class  Interval 
100 


95 


90 


85 


TruQ 

median,   8?.59 
Assumed     82.5 

median. 


80 


75 


70- 


es- 


se 


55 


50- 


45 


Casss 
—20 


4a 

FiGURK  VIII 


-62 


-49 


-27 


-48 


-23 


-18 


-21 


312 


EDUCATIONAL  MEASUREMENT 


be  considered  as  lying  below  the  true  median  and  would  be 
counted  with  those  below.  It  is  also  evident  that  there 
are  now  more  cases  below  the  true  median  than  above  it, 
when  the  27  cases  in  the  80-85  interval  are  assumed  to  be 
located  at  the  mid-point,  or  at  82.5.  This  means  that  all 
the  cases  below  the  true  median  are  too  short  by  the  differ- 
ence between  the  true  and  assumed  medians,  or  that  they 
are  short  0.09  of  a  unit  or  0.09/5  of  a  class  interval,  or  0.018 
of  a  class  interval.  It  also  means  that  all  the  cases  above 
the  true  median  are  too  long  by  the  difference  between  the 

Table' XXXI. — Distribution  of  Scores  Given  to  288  High-School 
Pupils  in  Plane  Geometry,  Illustrating  the  Computation 
OF  THE  Mean  Deviation  by  the  Short  Method 


Class  Interval 

/ 

d 

fd 

95-100 

20 

3 

60 

90-  94.99 

62 

2 

124 

85-  89.99 

49 

1 

49 

80-  84.99 

27 

0 

0 

75-  79.99 

48 

1 

48 

70-  74.99 

23 

2 

46 

65-  69.99 

18 

3 

54 

60-  64.99 

21 

4 

84 

55-  59.99 

9 

5 

45 

50-  54.99 

6 

6 

36 

45-  49.99 

3 

7 

21 

40-  44.99 

2 

8 

16 

iV  =  288 

2/d  =  583 

True  median 82 ,  59 

Assumed  median 82 . 5 

131  number  of  cases  above  assumed  median 
157  number  of  cases  below 


26  difference 

82.59-82.5 

5 

583+0.468 


=  0.018 


26X. 018=0. 468 


=  2 .  026  in  units  of  class  intervals 


288 
2. 026X5  =  10. 13  M.D 
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true  and  assumed  medians.  The  number  above  is  131, 
since  the  cases  in  the  80-85  interval  are  counted  with  those 
below.  The  number  below  is  157.  Therefore  131  cases 
are  too  long  and  157  cases  too  short,  and  the  difference 
between  them,  or  26  cases,  are  too  short  by  0.018  of  a  class 
interval,  hence  the  correction  must  be  added.  If  the  true 
median  had  been  below  the  assumed  median,  the  number 
of  cases  above  would  have  exceeded  those  below,  the  num- 
ber of  cases  above  would  have  been  too  short,  and  the  cor- 
rection would  have  had  to  be  added  as  in  the  case  cited. 

Table  XXXI  illustrates  the  computation  of  the  mean 
deviation  by  the  short  method.  In  this  table  we  shaU  use 
the  same  data  used  in  Table  XXX. 

If  it  is  desired  to  use  a  general  algebraic  formula  for  finding 
the  mean  deviation  by  the  short  method,  the  operation  may 
be  expressed  thus: 


M.D.  =  p^±^f=M^X5 


where  M.D.  =  the  mean  deviation,  /=  the  number  of  cases 
in  each  class  interval,  d  =  ihe  deviation  in  units  of  class 
intervals,  c  =  the  correction,  which  is  the  true  median  minus 
the  assumed  median,  divided  by  the  number  of  cases  in  the 
class  interval;  A^6  =  the  number  of  measures  below  the  true 
median,  Na=the  number  of  measures  above  the  true  median, 
and  N  =  the  number  of  measures  in  the  entire  distribution. 
5  =  the  number  of  units  in  the  class  interval.  Substituting 
in  the  general  formula  to  find  the  mean  deviation  in  Table 
XXXI: 

iVs  =  157    True  median  =  82.59    Assumed  median  =  82.5 
iV„  =  131     2/d  =  583     c^82.59-82.5^^^^g 

N  =288.-.   M.^.J83+0.018(157-131)J83 +0.018X26 

288  288 

2.026X5  =  10.13  M.Z). 
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SUMMAKY   OF   StEPS  IN   THE   COMPUTATION   OF  THE   MeAN 

Deviation  by  the  Short  Method 

1.  Arrange  the  data  in  a  frequency  distribution  of  four  columns. 
Let  the  first  column  to  the  left  contain  the  class  intervals;  the 
second  one  the  frequencies  (/);  the  thnd  the  deviations  (d);  and 
the  fourth  the  product  of  the  frequencies  times  the  deviations  (Jd). 

2.  Sum  the  frequencies  in  the/  column. 

3.  Compute  the  true  median  (TMa). 

4.  Take  the  mid-point  of  the  class  interval  containing  the  true 
median  as  the  assumed  median  {AM a). 

5.  Find  the  difference  between  the  true  median  and  the  assumed 
median  and  divide  this  difference  by  the  number  of  units  in  the 
class  interval.  This  will  give  the  difference  in  terms  of  the  class 
interval.     Call  this  the  correction  (c). 

6.  Compute  the  number  of  cases  above  and  below  the  true 
median  in  column  /.  Care  must  be  taken  to  see  whether  the  cases 
in  the  class  interval  containing  the  medians  shall  be  added  to  the 
cases  above  the  true  median  or  below  it.  If  the  assumed  median 
falls  below  the  true  median,  then  the  case  in  the  class  interval 
containing  the  medians  must  be  added  to  those  below  the  true 
median  for  reasons  given  on  page  311.  If  the  assumed  median 
falls  above  the  true  median,  add  the  cases  in  the  class  intervals 
containing  the  medians  to  those  above.  Multiply  the  correction 
found  in  step  5  by  the  difference  between  the  number  of  cases  above 
and  the  number  below  the  true  median. 

7.  Tabulate  the  deviations  (d)  from  the  mid-point  of  the  class 
interval  containing  the  assumed  median,  giving  the  first  class 
interval  above  a  de'S'iation  of  1,  the  second  one  a  deviation  of  2, 
etc.  Record  the  dcAdations  below  the  interval  containing  the 
assumed  median  in  the  same  way. 

8.  Multiply'-  each  frequency  (/)  by  its  respective  de\aation  and 
record  the  results  in  the  proper  jilace  in  the  fd  column. 

9.  Find  the  sum  of  the  fd's  without  regard  to  sign. 

10.  Add  the  total  correction  from  step  6  above  to  the  total  num- 
ber of  deviations  from  the  median  i'^fd). 

11.  Divide  this  sum  by  the  total  number  of  cases  in  the  distri- 
bution (A^)  to  get  the  mean  deviation  about  the  true  median.  The 
deviation  thus  computed  is  in  terms  of  class  intervals;  therefore 
multiply  this  result  by  the  number  of  units  in  the  class  interval  in 
order  to  find  the  deviation  in  terms  "f  the  original  measures. 
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The  Computation  of  Standard  Deviation. — We  defined 
standard  deviation  (page  306)  as  the  square  root  of  the 
arithmetic  mean  of  the  squares  of  all  the  deviations  meas- 
ured from  the  arithmetic  mean  of  the  distribution. 

If  the  series  is  simple  and  there  is  only  one  or  a  few- 
measures  of  any  given  magnitude,  the  formula  ia  this  case 
is  as  follows: 


In  substituting  in  this  formula  we  simply  find  the  amount 
each  measure  deviates  from  the  mean,  square  it,  add  the 
squares  of  all  the  deviations,  divide  by  the  total  nmnber  of 
cases,  and  extract  the  square  root. 

If,  however,  the  number  of  cases  is  large,  we  arrange  the 
data  in  a  frequency  distribution  and  apply  the  short  method. 
The  formula  in  this  case  is 


in  which  o-  =  the  standard  deviation,  S=the  sum  of  the/d^'s, 
/=the  frequencies  or  the  number  of  measures  in  each  class 
interval,  c^  =  the  deviations  from  the  mean,  and  iV  =  the 
number  of  measures  in  the  whole  distribution. 

The  Computation  of  Standard  Deviation  by  the  Short 
Method. — We  shall  note  the  short  method  of  computing 
the  standard  deviation  and  shall  then  compare  standard 
deviation  with  mean  deviation,  and  note  wherein  one  seems 
to  be  superior  to  the  other.  Table  XXXII  illustrates 
the  computation  of  the  standard  deviation  by  the  short 
method,  and  the  steps  in  the  procedure  are  given  in  the 
table  on  the  following  page. 
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Table  XXXII. — Distribution  op  Scores  Given  to  288  High-School 
Pupils  in  Plane  Geometry,  Illustrating  the  Computation 
OF  Standard  Deviation  by  the  Short  Method 
{Data  taken  from  Table  XXXI) 


Class 
Interval 

/ 

d 

Id 

/d^ 

95-100 

20 

3 

60 

180 

90-  94.99 

62 

2 

124 

248 

85-  89.99 

49 

27 

1 
0 

49 

49 

80-  84.99 

233 

7&-  79.99 

48 

-1 

-  48 

48 

70-  74.99 

23 

-2 

-  46 

92 

65-  69.99 

18 

-3 

-  54 

162 

60-  64.99 

21 

-4 

-  84 

336 

55-  59.99 

9 

-5 

-  45 

240 

50-  54.99 

6 

-6 

-  36 

216 

45-  49.99 

3 

-7 

-  21 

147 

40-  44.99 

2 

-8 

-   16 

128 

iV  =  288 

-350 

233 

288) -117 

-.406 

288)1,846 

6.41  =S» 

c= -0.406 
c2=     0.165 


6.41- 


■0.165  =6.245  =cr2 

0-  =  2 .  50  class  intervals 

2 .  50X  5  =  12 .  50  actual  units 


Summary  of  Steps  in  the  Computation  of  Standard  Devia- 
tion BY  THE  Short  Method 

1.  Tabulate  the  data  in  a  frequency  distribution  as  in  the  com- 
putation of  the  mean  deviation,  adding  a  fifth  column  to  the  right 
to  contain  the  /ci^'g, 

2.  Estimate  the  interval  which  contams  the  mean  (80-84.99). 
This  may  be  chosen  anywhere  in  the  distribution;  but  if  chosen 
in  the  same  interval,  or  near  the  true  mean,  the  computation  is 
simplified. 

3.  Tabulate  the  deviations  from  the  estimated  mean  (the  mid- 
point of  the  class  interval)  in  units  of  class  intervals;  that  is,  the 
first  interval  above  will  have  a  deviation  of  1,  the  second  one  a 
deviation  of  2,  and  so  on;  the  first  interval  below  will  have  a 
deviation  of  —1,  the  second  one  will  have  a  deviation  of  —2,  and 
soon. 
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4.  Multiply  each  frequency  by  its  corresponding  deviation  and 
record  in  the  Jd  column. 

5.  Find  the  algebraic  sum  of  the  /d's;  that  is,  find  the  sum  of 
the  +/<^'s  and  the  sum  of  the  — /d's,  and  take  their  difference: 
s/<i=  233-350  =-117. 

6.  Find  the  correction  (c)  by  dividing  Jd  by  the  total  number 
of  cases  in  the  distribution :    - 1 17-^  288  =  -0.406. 

7.  Multiply  each  jd  by  d,  its  corresponding  deviation,  and  record 
in  the  column  headed  /d^,  (The  student  should  use  the  table  of 
squares,  Table  I,  Appendix,  for  squaring  numbers,  also  for 
extracting  roots.) 

8.  Find  the  sum  of  the/d^'s:  2/^2  =  1,846. 

9.  Divide  the  sum  of  the  fd^'s  by  the  number  of  cases  in  the 
whole  distribution  to  get  ^S^:  l,846-^  288  =6.41.  This  is  the 
square  of  the  standard  deviation,  but  it  is  computed  from  an 
estimated  mean,  and  we  must  find  it  from  the  true  mean.  From 
the  previous  discussion  it  is  clear  that  the  mean  of  the  deviations 
about  the  estimated  mean  must  be  in  error  by  an  amount  equal 
to  the  arithmetic  mean  of  the  difference  of  the  positive  and  nega- 
tive deviations  in  column  4;  that  is,  the  arithmetic  mean  of  the 
squares  of  the  deviations  will  be  in  error  by  an  amount  equal  to 
the  squares  of  this  difference,  or,  c^.  Square  the  correction  c, 
giving  c2,  or  0.165  and  subtract  c^  from  S-,  giving  o-^.  It  was  noted 
in  step  3  that  the  deviations  were  in  terms  of  class  intervals; 
therefore  o-^  will  be  in  terms  if  class  intervals,  and  its  square  root 
will,  of  course,  be  in  the  same  terms.  Multiplying  a  by  the  number 
of  units  in  the  class  interval  (5)  will  give  the  desired  standard 
deviation,  12.50. 

The  methods  of  computing  the  mean  and  standard  devia- 
tions having  been  presented,  we  are  now  in  a  position  to 
compare  them  and  discuss  the  superiority  of  one  over  the 
other. 

In  the  introductory  chapter  on  statistical  methods  we 
devoted  considerable  space  to  the  exposition  of  the  arith- 
metical mean  as  the  basis  upon  which  rests  the  concept  of 
error  in  a  series  of  observations.  We  also  discussed  the 
method  of  least  squares  as  a  means  of  finding  the  most 
probable  value  of  a  series  of  observations.  We  are  now 
prepared  to  show  the  significance  of  mean  and  standard 
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deviations  graphically.  This  can  be  done  best,  perhaps, 
by  using  the  illustrations  given  by  Roberts.^  Suppose 
A  and  B  are  two  marksmen  firing  at  a  target  the  .center  of 
which  is  C,  Figure  IX. 

Suppose  each  man  fires  ten  shots.     Let  the  crosses  represent 
the  hits  made  by  a  and  the  circles  the  hits  made  by  b. 


Figure  IX.    shots  fired  at  a  target  by  two  marksmen;  illustrat- 
ing THE  DIPFFERENCE  BETWEEN  MEAN  AND  STANDARD  DEVIATION 

Table  XXXIII  shows  the  distance  of  each  shot  from  the 
center  of  the  target. 

Considering  the  amount  each  man  misses  the  target  as  a 
deviation,  we  find  that  the  sum  of  the  deviations  of  each 
marksman  is  50  and  the  mean  deviation  is  5.     Therefore, 


*  "A  Practical  Method  for  Demonstrating  the  Error  of  Mean  Square,' 
School  Science  and  Mathematics,  Vol.  19,  pp.  677-692. 
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Table  XXXIII 


Distance  of  Shots  from  Center  of  Target,  Inches 

Shot  Number 

A 

B 

1 

2 

4 

2 

8 

5 

3 

3 

6 

4 

7 

4 

5 

9 

3 

6 

4 

5 

7 

9 

5 

8 

5 

6 

9 

9 

7 

10 

1 

5 

50 

50 

if  mean  deviation  were  taken  as  a  measure  of  their  marks- 
manship, there  would  be  a  tie.  If,  however,  we  were  to 
compute  their  marksmanship  in  terms  of  standard  devia- 
tion, they  would  not  tie,  but  b  would  be  the  winner. 

Table  XXXIV  shows  the  standard  deviation  in  each 
series. 

Formerly  the  scores  at  rifle  practice  in  the  Belgian  army 
were  determined  by  adding  the  deviations  of  each  man's 
shots  from  the  center  of  the  target,  and  the  marksman 
having  the  smallest  sum  was  the  winner. 

It  is  evident  from  looking  at  the  scores  made  by  a  and  b 
in  Table  XXXIV  that  b  is  a  more  consistent  marksman 
than  A  because  his  marksmanship  is  less  variable;  that  is, 
he  "  bunches  his  shots."  At  one  time  a  fires  a  shot  close 
to  the  target,  and  the  next  shot  misses  the  target  by  many 
feet.  B  does  not  hit  as  close  to  the  target  as  does  a  but  is 
far  more  consistent  in  his  shooting. 

The  mathematical  significance  of  standard  deviation 
may  be  further  demonstrated  as  follows:  In  the  United 
States  army,  the  standard  size  target  used  on  rifle  ranges 
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Table  XXXIV 


A 

B 

Number 

of  Shots 

d 

d2 

d 

d2 

1 

2 

4 

4 

16 

2 

8 

64 

5 

25 

3 

3 

9 

6 

36 

4 

7 

49 

4 

16 

5 

9 

81 

3 

9 

6 

4 

16 

5 

25 

7 

2 

4 

5 

25 

8 

5 

25 

6 

36 

9 

9 

81 

7 

49 

10 

1 

1 

5 

25 

50 

10)334 
33.4 

50 

10)262 
26.2 

V33 . 4  =  Standard  deviation  of  a  =  5.77 
V26 . 2  =  Standard  deviation  of  b  =  5 .  11 

at  200-300  yards  distance  is  rectangular  in  shape,  4X6  feet 
in  dimensions,  in  the  center  of  which  are  three  concentric 
rings,  of  which  the  central  one  (called  the  bull's-eye)  has  a 
diameter  of  8  inches,  the  second,  a  diameter  of  26  inches, 
and  the  largest  one,  a  diameter  of  46  inches.  In  firing  at  a 
target  of  this  kind  the  score  is  made  up  as  follows: 

First  ring,  or  bull's-eye 5  points 

Second  ring 4  points 

Third  ring 3  points 

Outside  target 2  points 

As  indicated  above,  shots  placed  in  a  target  may  be  taken 
in  the  sense  of  errors,  or  deviations  from  a  mean.  The 
mean  in  this  case  is  the  mathematical  center  of  the  target. 
The  distance  of  each  shot  from  the  center  is  its  deviation. 
This  distance  from  the  center  of  a  target  may  be  considered 
as  a  radius  of  a  circle,  and,  since  a  shot  may  be  placed  at  any 
point  in  a  circle  around  the  center  of  the  target  within  the 
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limits  of  its  radius,  we  may  treat  the  various  misses  or 
deviations  from  the  center  as  being  proportional  to  the 
area  of  a  circle  irr^,  of  which  r  is  the  distance  of  thejshot 
from  the  center.  We  may  then  determine  the  sum  of  the 
areas  of  the  various  circles  formed  by  each  man's  shots  taken 
as  radii,  and,  from  our  reasoning  above,  the  man  having  the 
smaller  total  sum  of  circle  areas  will  be  the  winner.  Or, 
stating  the  relative  merits  of  their  marksmanship  in  a 
mathematical  formula  we  may  say  a  :  b  :  :  ^(jr^)  :  S(7rn^). 
TT,  being  a  constant,  may  be  eliminated,  and  the  formula 
may  be  stated  thus  a  :b  :  :  (Zr^)  :  (Srr).  The  rifleman 
having  the  smaller  sum  of  squared  deviations  would  win. 

If,  however,  we  wish  to  carry  the  measurements  a  step 
further  and  obtain  a  measure  of  the  relative  average  fluctu- 
ation or  miss  from  the  center  of  the  target  which  will  be  an 
absolute  measure  of  their  relative  marksmanship,  we  may 
divide  the  sum  of  the  squared  radii  (r^)  by  the  number  of 
shots  (iV)  in  order  to  get  the  average  squared  error  or  miss. 
The  square  root  of  this  will  be  the  best  possible  measure  of 
variability  from  the  center  of  the  target  taken  as  a  mean. 

Another  point  in  favor  of  the  use  of  standard  deviation 
is  the  fact  that  it  bears  a  definite  relation  to  the  normal 
probability  curve,  or  curve  of  error.  It  bears  the  same 
relation  to  the  curve  that  the  radius  of  a  circle  bears 
to  the  circle.  It  is  therefore  a  constant  and  limits  the 
spread  of  the  curve  so  that,  when  the  standard  deviation  is 
small,  the  measures  are  concentrated  near  the  center,  and 
the  curve  rises  rapidly,  whereas,  if  the  standard  deviation 
is  great,  the  curve  is  flat,  and  the  measures  are  scattered 
widely  from  the  center.  As  stated  above,  it  is  a  constant 
and  marks  the  point  where  the  curvature  of  the  curve 
changes  from  the  convex  to  the  concave  as  you  go  from  the 
center.     Its  value  includes  68.26  per  cent  of  the  cases. 

The  CoeflBcient  of  Variability. — In  the  foregoing  pages 
we  have  noted  the  principal  methods  of  representing  absolute 
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variability  of  frequency  distributions.  With  the  exception 
of  the  cases  of  variability  in  reference  to  marksmanship  no 
reference  was  made  to  the  relative  variabilities  of  two  or 
more  distributions.  The  emphasis  was  rather  to  compre- 
hend more  fully  the  distribution  of  the  measures  in  refer- 
ence to  their  measure  of  central  tendency.  To  measure 
the  amount  of  "  scatteration,"  or  spread  from  the  measure 
of  central  tendency,  we  employed  these  measures:  (1) 
mean  deviation,  (2)  standard  deviation,  and  (3)  probable 
error.  It  would  be  a  simple  matter  to  compare  the  varia- 
bility of  one  distribution  with  another  by  the  foregoing 
methods,  since  the  units  of  variabiHty  were  the  same; 
that  is,  in  the  case  of  grades  they  were  recorded  in  per  cent 
and  in  the  case  of  marksmanship  they  were  recorded  in 
inches.  It  very  frequently  happens,  however,  that  we  desire 
to  compare  the  variability  of  one  distribution  with  another 
when  the  units  are  different,  as,  for  instance,  comparing 
the  salaries  of  teachers  with  the  length  of  time  they  have 
been  in  the  service.  In  a  distribution  of  this  kind  one  of 
the  units  would  be  doUars  and  the  other  years.  It  is  there- 
fore necessary  to  devise  a  measure  of  relative  variability  to 
cover  these  cases. 

Pearson  has  devised  such  a  measure,  which  he  calls  the 
coefficient  of  variation.  It  is  the  measure  of  the  ratio  of 
absolute  variability  (standard  deviation,  mean  deviation, 
quartile  deviation,  or  probable  error)  to  the  average  from 
which  these  deviations  are  taken  (arithmetic  mean,  or 
median.)     It  may  be  expressed  by  the  formula 

lOOo- 
m 

in  which  V  is  the  measure  of  variability,  o-  is  the  standard 
deviation,  and  M  the  median,  or  mean.  By  this  measure 
one  is  merely  finding  the  per  cent  that  the  absolute  varia- 
bility bears  to  the  average  from  which  the  deviations  are 
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computed.     It  is  clear  that  any  other  measure  of  varia- 
bility might  be  used  instead  of  the  standard  deviation. 

A  measure  of  this  kind  is  independent  of  the  units  used 
and  will  show  relative  variability  even  though  the  units  in 
the  two  distributions  compared  are  entirely  different. 
Thorndike  proposes  to  take  the  square  root  of  the  measure 
of  central  tendency  instead  of  using  it  as  did  Pearson.  His 
formula  would  read : 

100  M.I>. 


V= 


VMedian 


The  Pearson  coefficient  of  variability  seems  to  be  a  better 
measure,  however,  taking  one  distribution  with  another, 
than  the  one  proposed  by  Thorndike. 


CHAPTER  XII 

THE  MEASUREMENT  OF  RELATIONSHIP,  OR  CORRELATION 

Need  for  Measures  of  Relationship. — One  more  group 
of  measures  is  necessary  to  equip  the  student  with  adequate 
means  for  dealing  with  educational  data.  We  desire  not 
only  to  know  the  distribution  of  measures  in  a  series  of 
educational  data,  but  many  times  we  desire  to  compare 
one  series  with  another  and  therefore  need  some  measure 
that  takes  cognizance  of  the  distribution  of  the  various  cases 
in  the  series  and  at  the  same  time  expresses  the  movements 
of  the  group  as  a  whole. 

It  is  sometimes  stated  that  there  is  a  high  correlation 
between  abilities  in  mathematics  and  abihties  in  Latin. 
In  order  to  compare  two  groups  of  this  kind  it  is  necessary 
to  have  measures  that  express  the  efficiency  of  each  group 
as  a  whole.  It  is  sometimes  necessary  to  know  how  one 
group  varies  as  compared  with  another.  The  comparison 
of  one  series  of  data  with  another  is  usually  spoken  of  as 
correlation. 

The  measurement  of  relationship  or  correlation  is  a  tech- 
nical thing.  Its  relation  to  causation  is  also  technical  and 
difficult  to  understand.  In  order  to  help  clarify  the  sub- 
ject and  present  it  from  a  number  of  points  of  view,  we 
shall  refer  to  the  statement  of  the  problem  as  discussed  by  a 
number  of  the  leading  statisticians  who  have  written  on 
the  subject.  We  shall  also  give  a  number  of  simple  illus- 
trations which  will  throw  some  light  on  the  solution  of  the 
problem. 

Comparison  involves  the  pairing  of  things  or  events  that 
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are  not  identical  in  all  particulars  as  to  time,  place,  and 
condition.  A  study  of  cause  and  effect,  whether  coincidence 
or  sequence,  becomes  largely  a  study  of  association.  Causes 
never  operate  twice  under  exactly  the  same  circumstances. 
Oneness  of  effect  is  only  apparent.  When  making  compari- 
sons in  education  or  psychology,  there  is  a  tendency  to 
attempt  to  safeguard  oneself  against  error  and  criticism  by 
introducing  the  proviso,  "  other  things  being  equaV^  But 
"  other  things "  are  rarely,  if  ever,  equal  in  actual  life. 
Pearson  says:^ 

Individual  phenomena  can  only  be  classified  and  our  problem 
turns  on  how  far  a  group  or  class  of  like,  but  not  absolutely  same, 
things  which  we  term  "  causes  "  will  be  accompanied  or  followed 
by  another  group  or  class  of  like,  but  not  absolutely  same,  things 
which  we  term  "  effects." 

Bowley  discusses  correlation  as  follows:^ 

When  two  quantities  are  so  related  that  the  fluctuations  in  one 
are  in  sjinpathy  with  fluctuations  in  the  other,  so  that  an  increase 
or  decrease  of  one  is  found  in  connection  with  an  increase  or  decrease 
(or  inversely)  of  the  other,  and  the  greater  the  magnitude  of  the 
changes  in  the  one,  the  greater  the  magnitude  of  the  changes  in 
the  other,  the  quantities  are  said  to  be  correlated. 

Davenport  says :  ^ 

The  whole  subject  of  correlation  refers  to  that  interrelation 
between  separate  characters  by  which  they  tend,  in  some  degree 
at  least,  to  move  together.  This  relation  is  expressed  in  the  form 
of  a  ratio. 

In  reference  to  zero  correlation  he  says : 

If  the  characters  in  question  are  absolutely  indifferent  the  one  to 
the  other,  the  correlation  is  said  to  be  zero,  indicating  mere  associ- 
ation under  the  law  of  independent  probability,  without  causative 
relation  of  any  kind. 

^  Karl  Pearson,  The  Grammar  of  Science,  p.  157. 

*  A.  L.  Bowley,  Elements  of  Statistics,  p.  316. 

*  Principles  of  Breeding,  p.  453. 
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Pearson  saj^s :  ■* 

When  we  vary  the  cause,  the  phenomena  changes,  but  not 
always  to  the  same  extent;  it  changes,  but  has  variations  in  its 
change.  The  less  the  variation  in  that  change,  the  more  nearly 
the  cause  defines  the  phenomena,  the  more  closely  we  assert  the 
association  or  correlation  to  be.  It  is  this  conception  of  correla- 
tion between  two  occurences  embracing  all  relationships  from 
absolute  independence  to  complete  dependence,  which  is  the 
wider  category  by  which  we  have  to  replace  the  old  idea  of 
causation.  Everything  in  the  universe  occurs  but  once;  there  is  no 
complete  sameness  of  repetition. 

Secrist  says :  ^ 

A  measure  of  correlation  is  a  statement  of  probabiHties,  the 
reliability  of  which  is  determined  by  the  degree  to  which  the 
samples  represent  the  whole  "  population  "  and  the  conditions 
■under  which  the  samples  are  taken,  the  range  of  condition. 

We  are  striving  to  devise  some  methods  of  determining 
the  degree  of  causal  connection  exhibited  by  certain  traits 
and  activities  in  school  work.  The  measuring  of  mental, 
social,  and  physical  activities  constantly  involves  the  study 
of  causation,  or  causal  connection,  between  two  or  more 
traits  in  question. 

One  of  the  psychological  problems  about  which  there  has 
been  much  discussion  in  the  last  decade  is  the  question  as 
to  whether  or  not  "  school "  abilities  are  specialized  or 
general.  For  example,  what  is  the  probability  that  a  pupil 
who  shows  a  high  degree  of  achievement  in  Latin  will  show 
a  high  degree  of  achievement  in  mathematics?  If  we  had 
several  hundred  cases  and  we  found  that  some  students 
who  were  good  in  Latin  were  also  good  in  mathematics, 
that  some  who  were  good  in  Latin  were  mediocre  in  mathe- 
matics, and  that  a  few  who  were  good  in  the  subject  of 

*  The  Gram.mar  of  Science,  p.  157. 

s  An  Introduction  to  Statistical  Methods,  p.  438. 
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Latin  were  very  poor  in  mathematics,  and  vice  versa,  it 
would  be  very  difficult  to  draw  a  conclusion  as  to  the 
correlation  between  the  two  traits  unless  we  had  some 
way  of  measuring  the  amounts  of  correspondence  between 
them. 

Or,  again,  we  might  have  a  gi'oup  of  data  showing  that  the 
student  who  ranked  first  in  Latin  also  ranked  first  in  mathe- 
matics, and  the  student  who  ranked  second  in  Latin  ranked 
second  in  mathematics,  and  so  on.  Data  of  this  kind 
would  show  that  there  is  a  causal  relation  existing  between 
the  two  traits,  but  it  would  not  show  how  much.  This 
method  of  measuring  the  degree  of  correspondence  between 
two  traits  takes  account  only  of  the  position  or  rank  of  the 
various  measures  in  the  series  and  neglects  the  absolute  amounts 
of  the  measures.  For  the  measurement  to  be  complete  the 
actual  proportional  differences  between  each  two  consecu- 
tive marks  must  be  measured. 

Two  methods  have  been  devised  that  take  cognizance 
of  the  actual  size  of  the  various  measures  of  the  traits  and 
show  the  degree  of  correspondence  between  them.  The 
first  is  the  graphic  method,  and  the  second  is  by  the  use  of 
mathematical  formulas.  WhUe  the  first  method  does  take 
cognizance  of  the  size  of  each  measure  in  the  distribution, 
it  nevertheless  must  be  refined  by  the  apphcation  of  mathe- 
matics before  the  absolute  value  may  be  found. 

We  shall  first  show  how  to  represent  the  degree  of  corre- 
lation between  two  traits  by  the  use  of  mathematical  formu- 
las, and  then  present  methods  for  representing  it  graphically. 
It  may  seem  to  be  more  logical  to  present  a  graphic  repre- 
sentation of  the  subject  first,  but  it  is  believed  that  the 
subject  may  be  made  clearer  by  reversing  what  might  seem 
to  be  the  logical  order. 

Perhaps  enough  has  been  said  about  correlation  and  causa- 
tion to  introduce  the  reader  to  the  subject.  We  shall  now 
illustrate  the  computation  of  the  coefficient  of  correlation 
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by  a  number  of  simple  problems  and  supplement  the  previ- 
ous treatment  by  additional  discussion  as  new  problems 
arise. 

1.  Illustrating-  the  Computation  of  the  Coefficient  of 
Correlation :  Data  Simple  and  Ungrouped. — The  unit  with 
which  we  generally  measure  the  degree  of  likeness  or  corre- 
lation of  one  series  with  another  is  called  the  coefficient  of 
correlation.  The  formula  used  is  that  derived  by  Karl 
Pearson  and  is  variously  expressed  as  follows: 

'Exy  'Lxy 

r=    ,         =  or  T^ 


The  Meaning  of  Symbols  Used 

r  =the  coefficient  of  correlation. 

S  =the  summation  of  the  x-deviations  multiphed  by  the  cor- 
responding ?/-de\aations.  (s  always  means  summation 
when  used  in  the  formulas.) 

I  =the  deviation  of  each  particular  measure  from  the  average 
in  the  first,  or  subject  series. 

y  =the  deviation  of  each  particular  measure  from  the  average 
in  the  second,  or  relative  series.  In  representing  the 
deviations  by  x  and  y  the  proper  algebraic  signs  must  be 
retained. 

x^  =the  square  of  the  deviation  of  the  subject  series. 

2/2  =the  square  of  the  deviation  of  the  relative  series. 

o-x  =  (second  formula)  equals  the  standard  de\aation  of  the  first 
series  and  Is  the  same  as  the  Vx^  in  the  first  formula. 

<ry  =  the  standard  deviation  in  the  second  series. 

N  =the  number  of  pairs  of  items  in  the  series. 

The  coefficient  of  correlation  r  is  a  constant  and  a  measure 
of  great  importance  in  expressing  correlation.  It  is  evi- 
dently a  pure  number,  and  its  magnitude  is  unaffected  by 
the  units  in  which  x  and  y  are  measured;  for  the  numerator 
and  denominator  are  affected  to  the  same  extent. 

The  development  of  the  Pearson  formula  gives  values 
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for  r  ranging  from  —1  thi'ough  Oto+1.  Ifr=4-1,  the 
correlation  is  said  to  be  perfect  and  positive  and  means 
that  large  values  in  the  first  series  are  accompanied  by  large 
values  in  the  second  series,  and  vice  versa.  If  r  =  —  1,  the  cor- 
relation is  perfect  but  negative  and  means  that  large  values 
in  the  first  series  are  accompanied  by  small  values  in  the 
second  series,  and  vice  versa.  When  r  =  0  the  two  series 
are  independent. 

Table  XXXV. — Illustkating  the  Computation  op  the  Coeffi- 
cient OF  Correlation  Between  Addition  and  Handwriting 

{Hypothetical  Case) 


Pupils 

Scores  in  Addition 

Scores  in  Handwriting 

(Subject  Series) 

(Relative  Series) 

a 

3 

5 

B 

4 

8 

C 

5 

7 

D 

8 

12 

E 

10 

13 

5)30 

5)45 

6= mean 

9= mean 

The  deviation  of  each  score  from  its  respective  mean  is  given 
below;  also  the  product  of  each  deviation  in  the  first  series 
by  its  corresponding  deviation  in  the  second  series. 


Pupils 

Deviations,  x,  from  the 
Mean  Subject  Series 

Deviations,  y,  from  the 
Mean  Relative  Series 

xy 

A 
B 

c 

D 

E 

-3 

-2 
-1 

+2 
+4 

-4 

-1 
-2 
+3 
+4 

12 
2 

2 

6 

16 

38 
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Squaring    the    individual    deviations    to    find    standard 
deviations  we  have: 


x^ 

y1 

9 

16 

4 

1 

1 

4 

4 

9 

16 

16 

34 


46 


We  are  now  ready  to  substitute  in  the  formula : 
l^xy  38 


r  = 


y/i:x-Zif     V34X46 


=  0.962 


In  order  to  get  a  value  for  r  equal  to  zero,  it  is  evident 
that  the  numerator  of  the  fraction  in  the  above  formula 
must  be  zero.  A  hypothetical  case  given  in  Table  XXXVI 
will  illustrate  this  point. 

Table  XXXVI. — Illustrating  the  Computation  of  the  Coeffi- 
cient OF  Correlation  Equal  to  Zero 


First  Series 

Second  Series 

Deviations, 

First  Series 

Deviations, 
Second  Series 

xy 

95 

92 

+25 

+2 

+50 

85 

90 

+  15 

0 

0 

75 

88 

+5 

-2 

-10 

65 

88 

-  5 

_2 

+10 

55 

90 

-15 

0 

0 

45 

92 

-25 

+2 

-50 

6)420 

6)540 

x2/=0 

70  =  mean 

90  =  mean 

Since  xy^zero,  therefore  the  correlation  between  the  two 
series  is  zero. 
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Table  XXXVII. — Illustrating  a  Perfect  Positive  CoRRELATioisr 


First  Series 

Second  Series 

Deviations, 

First  Series 

Deviations, 
Second  Series 

xy 

20 

30 

-10 

-25 

250 

24 

40 

-  6 

-15 

90 

28 

50 

-  2 

-  5 

10 

32 

60 

+  2 

+  5 

10 

36 

70 

+  6 

+  15 

90 

40 

80 

+  10 

+25 

250 

6)180 

6)330 

a;y=700 

SO  =  mean 

55  =  mean 

X' 

100 

36 

4 

4 

36 

100 

280 


625 
225 
25 
25 
225 
625 

1750 


Substituting  formula, 


r=- 


■.xy 


700         ^700 
~700^ 


V280X  1,750 

Therefore  the  correlation  is  perfect.  A  perfect  negative 
correlation  may  be  illustrated  in  the  same  way. 

2.  Illustrating'  the  Computation  of  the  Coefficient  of  Cor- 
relation: Data  Complex  and  Grouped  in  Class  Intervals. — 
In  Tables  XXXV,  XXXVI,  and  XXXVII  we  iUustrated 
the  computation  of  the  coefficient  of  correlation  by  simple 
problems  where  the  number  of  pairs  of  values  were  five  and 
six.  It  is  evident  that,  if  we  had  hundreds  of  pairs  of 
values  and  attempted  to  use  the  same  procedure,  the  amount 
of  labor  in  the  computation  would  be  very  great.  A  glance 
at  the  Pearson  formula  shows  that  the  coefficient  of  corre- 
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lation  is  found  in  terms  of  measures  of  central  tendency 
and  measures  of  variability;  that  is,  we  compute  the  mean 
of  each  series,  find  the  deviation  of  each  measure  from  its 
mean,  multiply  it  by  the  corresponding  deviation  from  the 
mean  in  the  other  series,  and  find  the  sum  of  these  products 
for  the  numerator  of  the  fraction.  The  denominator  is 
the  product  of  the  standard  deviations  of  the  two  series. 
There  is  therefore  really  nothing  new  in  the  development 
of  this  formula,  since  we  showed  in  the  two  previous  chap- 
ters how  measures  of  central  tendency  and  variability  might 
be  found.  The  chief  concern  here  is  to  devise  a  distribution 
table  that  will  reduce  the  labor  to  a  minimum.  Two  plans 
will  be  presented. 

One  will  be  a  double  entry  distribution  table  which  is 
simply  a  device  for  the  arrangement  of  the  pairs  of  values 
to  facilitate  computation,  and  the  other  will  be  a  short 
method  for  computing  correlations  proposed  by  Dr. 
Leonard  P.  Ayres.^ 

We  shall  first  note  the  computation  of  the  coefficient  of 
correlation  by  the  traditional  method,  using  a  double  entry 
distribution  table,  and  at  the  same  time  take  advantage  of 
the  short  methods  in  computing  the  means  and  deviations 
developed  in  Chapters  IX  and  X. 

Table  XXXVIII  illustrates  the  tabulation  of  the  data 
and  the  computation  of  the  coefficient  of  correlation  with 
310  cases  in  the  subjects  of  Latin  and  mathematics.  In 
this  table  the  mathematics  is  known  as  the  first  or  subject 
series,  and  the  Latin,  the  relative  or  second  series.  The 
columns  and  rows  are  spoken  of  as  arrays,  the  columns  as 
y-arrays  of  type  x,  and  the  rows  as  x-arrays  of  type  y.     The 


*  "A  Shorter  Method  for  Computing  the  Coefficient  of  Correlation," 
Journal  of  Educational  Research,  Vol.  1,  March,  1920.  Also  "The 
Application  to  Tables  of  Distribution  of  a  Shorter  Method  for  Com- 
puting Coefficients  of  Correlation,"  Journal  of  Educational  Research, 
Vol.  1,  AprU,  1920. 
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size  of  the  class  intervals  is  determined  here  in  the  same 
way  as  it  was  in  the  previous  discussions  in  Chapters  IX 
and  X.  There  are  five  units  in  each  class  interval.  The 
limits  of  the  class  intervals  are  written  at  the  upper  and 
left-hand  margin  of  the  table.  The  number  of  class  inter- 
vals is  determined  by  the  range  of  the  grades;  from  40  to 
100  in  Latin,  and  from  60  to  100  in  mathematics.  Noting 
the  table  (first  row  at  the  top)  we  see  that  5  students  made 
grades  of  from  95  to  100  in  Latin  and  from  65  to  70  in 
mathematics;  20  made  grades  of  from  95  to  100  in  Latin 
and  from  80  to  85  in  mathematics,  and  so  on. 

The  heavy  black  horizontal  lines  marking  the  limits  of 
the  class  interval  70-74.99  designate  the  row  and  class 
interval  that  contains  the  assumed  mean  of  the  Latin 
series.  The  heavj^  black  vertical  lines  at  the  limits  of  the 
class  interval  80-84.99  mark  the  column  and  class  interval 
that  contains  the  assumed  mean  for  the  mathematics  series. 
The  small  figures  in  the  upper  right-hand  corner  of  the 
squares  are  the  deviations  from  the  assumed  mean  (in  units 
of  class  intervals)  of  the  subject  series,  multiplied  by  the 
corresponding  deviations  from  the  assumed  mean  of  the 
relative  series,  and  this  product  multiplied  by  the  number 
of  measures  in  the  class  interval.  Thus  the  —75  in  the 
second  column  from  the  left  and  the  first  row  from  the  top 
means  that  the  5  measures  in  that  square  have  a  deviation 
of  —3  from  the  assumed  mean  of  the  mathematics  series 
and  a  -{-5  from  the  assumed  mean  in  the  Latin  series. 
Hence:  —3X5X5=  —75.  The  other  figures  are  deter- 
mined in  a  similar  way.  The  fx-row  at  the  bottom  gives 
the  sum  of  the  measures  in  the  various  columns,  and  the 
/(/-column  at  the  right  gives  the  sum  of  the  measures  in  the 
various  rows.  The  Sx'^' =  column  is  the  sum  of  the  prod- 
ucts of  the  deviations  of  each  measure  from  the  assumed 

mean  in  each  series;  that  is,  —75+70=  —5;  — 72H 96  = 

— 168,  and  so  on. 
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Table  XXXVIII. — Illustrating  the  Computation  of  the  Coeffi- 
cient OF  Correlation  op  310  Cases  of  Latin  and  Mathematics 

Ability  in  Mathematics 


60 

32 
2 

2 

65 

_7S 
5 

_72 
6 

84 
14 

108 

12 
37 

70 

—  96 

12 
12 

75 

—  11 
11 

0 

79 

24 

12 
102 

80 

0 

20 

0 
35 

0 

1 

_30 
86 

85 

40 

20 

-81 

27 

47 

90 

70 

7 

48 
12 

19 

95 
46 

5 
5 

32 
18 
40 
32 
11 
80 
0 
26 
39 
2 
0 

30 
310 

d 

5 

4 

3 

2 

1 

0 

-1 

-2 

-3 

-4 

-5 

-6 

160 
72 

120 
64 

11 

427 

-  52 
-117 

-  8 

-180 
-357 

2 

800 
288 
360 
128 
11 

104 

351 

32 

1,080 
310)3,154 

10.174=52 
0.051=4 

Zx'ti' 

100 
95 

-    5 

94.99 
90 

-168 

89.99 

85 

45 

84.99 
80 

88 

79.99 
75 

-  11 

74.99 
70 

69.99 
65 

64.99 
60 

108 

59.99 
55 

27 

54.99 
50 

32 

300 

49.99 
45 

-184 
310)116 

44.99 
40 

0.374= 

-? 

10.123=(t2 
3.181=ff» 

+++ 


I  I  I 


to      t^OOWS    f   o 
■^     ^  CO  •-<       o 


_  so      .-H 

M  1    d  d 


427 

310)  70 
0.226= 

Cj, 

0.051  = 

'I 

'^x^y= 

-0.105 

'3f'y= 

4.479 

0.374-(- 

-0.105) 

0.479 

4.4 

79 

4.479 

> 0.107    Am. 
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Summary  op  Steps  in  the  Computation  of  the  Coefficient 
OF  Correlation 

1.  Find  the  number  of  measures  in  each  series  and  designate 
the  number  by  A'^.     (N  will,  of  course,  be  the  same  in  each  series.) 

2.  Estimate  the  class  interval  that  contains  the  mean  and  take 
its  mid-point  as  the  assumed  mean  (the  same  as  was  done  in  find- 
ing the  mean  by  the  short  method  in  Chapter  IX).  For  example, 
70-74.99  for  the  y's  and  80-84.99  for  the  x's. 

3.  Tabulate  the  deviations  from  the  means  chosen  m  terms  of 
class  intervals.  For  example,  the  first  row  above  the  class  interval 
70-74.99  has  a  deviation  of  +1,  the  second  row  +2,  and  so  on. 
The  first  row  below  has  a  deviation  of  —1,  the  second  a  deviation 
of  —2.  The  first  column  to  the  right  of  the  mean  column  80-84.99 
has  a  deviation  of  +1,  the  second,  +2;  the  first  to  the  left  a  devia- 
tion of  —1,  and  the  second  a  deviation  of  —2,  and  so  on. 

4.  To  the  right  of  the  table  make  a  <i-column  to  record  the 
^/-deviations,  also  an  fdy-cohimn,  an  /d^-column  and  an  Sx'y'- 
column,  which  latter  is  the  sum  of  the  products  of  x  and  y  devia- 
tions calculated  from  an  assumed  mean.  Similarly  record  the  x 
deviations  and  other  similar  values  at  the  bottom  of  the  table. 

5.  Multiply  each  frequency  (/^/-column  for  the  y's)  by  its 
respective  de\'iation.  For  example,  32x5  =  160;  18x4=72; 
and  so  on,  retaining  the  algebraic  signs.  Find  the  fd's  for  the  a;'s 
in  a  similar  way. 

6.  Find  the  algebraic  sum  of  the  fdyS.  For  example,  Zfdy  = 
-357+427=70;  simUarly  S/(ix=  -245+100=  -145. 

7.  Divide  the  s/d's  by  the  number  of  cases,  N,  to  give  the 

70  145 

correction  c.    For  example,  Cy  =  ——  =  0.226;  Cx  =  — ^rr  =  -0.468. 

8.  Square  the  corrections  c^=  0.051;  c|  =  0.219. 

9.  Multiply  each  fdy  by  d,  its  corresponding  deviation,  gi\dng 
the  figures  ia  the  /d^-column.  For  example,  in  the  t/-series, 
160  X 5  =800;  72  x4  =288,  and  so  on.  In  the  x-series  -8X  -4  = 
32;  and  -lllx  -3=333. 

10.  Find  the  sum  of  the  fd^'s,  in  each  series.  For  example, 
2/d;  =3,154,  and  2/4  =  683. 

11.  Divide  each  of  those  sums  by  N,  the  number  of  cases,  to 
give  S^,  the  square  of  the  standard  deviation  of  each  distribution 

around  the  assumed  mean.    *S^  =  -^—-  =  10.174;  >Si  =  — -=2.203. 

olO  olO 
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12.  Subtract  c^,  the  square  of  the  correction,  from  5^,  and 
cl  from  the  square  of  the  correction  Si.  For  example,  10.174  — 
0.051  =  10.123  =<tI;  2.203  -0.219  =  1.984  =al 

13.  Find  the  square  roots  of  o-^  and  <tI.  For  example,  o-j,= 
3.181;  crx  =  lA08. 

14.  Compute  the  'Ex'y"s  by  findmg  the  sum  of  the  deviations 
of  the  measures  in  a  particular  row  from  the  mean  of  the  x's  of  the 
whole  table,  x.  This  gives  2x'.  Multiply  2x'  by  i/,  the  deviation 
of  this  particular  row  from  y,  the  mean  of  the  y's  of  the  whole  table. 
This  gives  Xx'y',  which  is  the  product-sum  of  the  de\dations  about 
the  two  assumed  means. 

This  would  give  us  the  numerator  of  the  fraction  in  the 
Pearson  formula  were  it  not  for  the  fact  that  both  the  x' 
and  y'  deviations  have  been  computed  from  assumed  means 
instead  of  the  true  means  and  hence  must  be  corrected. 
Since  the  means  were  in  error  by  the  corrections  Cx  and  Cy, 
so  each  deviation  from  them  on  y  and  x  must  be  in  error  by 
the  same  amount. 

Going  through  the  various  rows  and  finding  the  alge- 
braic sum  of  the  products  of  x'y'  we  get  —5,  —168,  45, 
etc.,  of  the  2x't/' =  column.  For  example,  adding  -75 
and  +70  gives  -5;  -72  and  —96  gives  -168,  and  so  on 
The  algebraic  sum  of  the  Sa;'?/'-column  =  116,  which,  when 

1 1  fi 
divided  by  the  total  number  of  measures,  A''  =  ^  =  0.374  = 

?^;   c„ar  =  0.226X -0.468= -0.105.      It   may   be   shown 

algebraically  that  Zxy,  the  deviations  of  x  and  y  from  their 

Xx'y' 
true  means  =  —^ CxC,,. 

We  may  therefore  write  the' Pearson  formula  thus:  ^ 


^  Let  Ex  and  Ey  represent  the  estimated  means  of  the  two  series  and 
Cx  and  Cy  be  corrections  to  be  applied  to  the  estimated  means  to  get 
the  true  means.  Then  the  true  means,  Mx  and  Mv,  are  respectively, 
Mx  =  Ex+cx  and  Mv=Ev+Cv. 

Let  X  and  y  be  deviations  from  the  true  means,  Mx  and  My. 

Let  x'  and  y'  be  deviations  from  the  estimated  means.  Ex  and  Ev. 
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-Lx'y'-NcxCy       N 


^x'y' 

CxCy 


NaxCy  CxCj, 

We  now  have  made  all  corrections  so  that  we  may  sub- 
stitute directly  in  the  formula, 


'./ 


_  i:x'y 

r—        j^  CxCy 

(Txf^y 

which  equals  the  Pearson  formula, 

N(7xO-y 

Substituting, 

^0.374- (-0.105)  _  0.479^ 
^        3.181X1.408        4.479 

3.  Illustrating  the  Computation  of  the  Coefiicient  of  Cor- 
relation by  the  Short  Method  (Adapted  from  Ayres) :  (a) 

Series  Simple  and  Ungrouped. — A  shorter  and  more  direct 


Thus 

x'  =  x+cx    and     y'  =  y+Cy. 
Therefore, 

Sx'y '  =  2  (x +cx)  {y  -\-Cv) 

=  Sxy+Cj^Sx+CiSy+SczCi/. 

Now  since  'Lx  and  Sj/  (the  sum  of  the  x  and  y  deviations  from  the 
true  mean)  each  =  0,  then 

Sx'y'  =  Sz2/+2ciCi,,     or     'Zxy  =  'Lx'y' —  'ZciCy. 

or,  substituting  this  expression  in  the  equation, 

'Lxy 

NaxCy 

we  get, 

'Lx'y'-NcxCy     Sxy 

^=        Naxay         =  -JT -"'''■ 

(Xx'Ty 

(Adapted  from  H.  L.  Rietz,  Bulletin  No.  148,  University  of  Illinois 
Agricultural  Experiment  Station,  1910.) 
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method  has  been  evolved  by  Ayres.*  We  shall  now  illus- 
trate the  computation  of  the  coefficient  of  correlation  by 
this  method  by  two  simple  series  with  data  ungrouped. 
The  method  used  for  finding  the  coefficient  of  correlation 
in  Table  XXXVIII  was  a  long  one  even  though  many 
"  short  cuts  "  were  used. 

In  Table  XXXV  we  gave  two  simple  series  to  illustrate 
the  computation  of  the  coefficient  of  correlation.  The 
process  will  be  repeated  in  part  to  show  the  advantages 
of  the  "  short  cuts  "  used  by  Ayres.     The  series  were: 

Subject  Series  Relative  Series 

3 5 

4 8 

5 7 

8 12 

10 13 

30 45 

The  mean  of  the  subject  series  is  6,  that  of  the  relative,  9. 
Calling  the  deviation  from  the  mean  in  each  series  x  and  y 
respectively,  and  multiplying  the  deviation  in  one  series 
by  its  corresponding  deviation  in  the  other  and  also  finding 
the  standard  deviations  of  the  two  series,  we  have: 

X  y  xy  x"-  y- 

-3  -4  12  9  16 

-2  -12  4  1 

-1-2214 
+2+3649 
+4  +4  16  16  18 

38  34  46 

which  when  substituted  in  the  Pearson  formula, 

r= — =0962 

"^    V34X46    ^-^^-^ 

The  shorter  method  proposed  by  Ayres  gives  the  siuns  of 
the  products  and  the  sums  of  the  squares  of  the  deviations 

"Ayres,  lor.,  cit 
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directly  from  the  squares  of  tne  original  numbers.  By  this 
method  it  is  not  necessary  to  arrange  the  measures  in  order 
of  their  magnitude  or  to  find  the  separate  deviations  from  the 
means.  Thus  is  avoided  the  necessity  of  taking  into  account 
the  plus  and  minus  signs  of  deviations. 

There  are  two  fundamental  principles  in  the  computa- 
tion of  the  coefficient  of  correlation  by  this  method  which 
the  student  should  master. 

1.  This  method  considers  every  number  in  the  series  as 
being  equal  to  the  mean  of  the  series  plus  a  plus  or  minus 
deviation  from  that  m^an.  Thus  3,  which  is  the  first  number 
in  the  first  series  in  the  above  illustration,  equals  the  mean 
of  the  series,  or  6,  plus  a  minus  deviation,  —3;  4  equals 
the  mean  of  the  series,  or  6,  plus  a  minus  deviation,  —2; 
and  so  on. 

2.  If  the  sum  of  the  squares  of  the  numbers  in  a  series  is 
found  and  from  it  is  subtracted  the  product  found  by  multi- 
plying the  square  of  the  mean  of  the  series  by  the  number  of 
cases,  the  remainder  will  be  the  sum  of  the  squares  of  the  devia- 
tions from  the  mean. 

For  example,  in  the  subject  series  given  above  the  numbers, 
their  means,  and  squares  are : 


3 

squared 

=     9 

4 

=  16 

5 

=  25 

8 

=  64 

10 

=  100 

5)30 

214 

Mean  =     6 

180 

Mean  squared  =  36 

34  the  sum  of  the  squares  of  the  de 

Number  of  cases  =     5 

viations  of  the  subject  items. 

180 


By  utilizing  this  method  the  coefficient  of  correlation  may 
be  computed  directly  from  the  products  of  the  squares  of 
the  items  of  the  two  series  without  finding  the  separate 
deviations. 
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The  operations  to  be  performed  may  be  expressed  as 
follows: 

r= 


^S^-(^'](^R^-^^^^' 


N  J\  N 

where  S  =the  individual  subject  items  as  3,  4,  5,  8,  10,  in 
the  subject  series  above 
R  =  the  individual  relative  items 
S  =the  sum 
*S2  =  the  square  of  each  individual  subject  item,  as  9, 

16,  25,  64,  100 
R^  =  the  square  of  each  individual  relative  item 
N  =  the  number  of  cases 

Making  the  corrections  and  substituting  in  the  formula 
with  data  used  above,  we  have 

308-55^ 
5 


V(2H-f)(45.J 


2,025 

31- 

308-270 


V(214-180)(451-405) 
38  38 


-    / Qo  ^  =  0.962 

V34X46    39.5 

In  order  to  make  the  above  substitutions  clear,  let  us  go 

through  the  various  steps  and  note  how  each  part  is  derived. 

The  308  is  the  sum  of  the  products  of  the  subject  and  relative 

items,  that  is:  3X5  +  4X8  +  5X7  +  8X12  +  10X13  =  308. 

30X45 
The  fraction  — ^ —  is  the  sum  of  the  subject  items,  30, 

multiplied  by  the  sum  of  the  relative  items,  45,  and  divided 
by  the  nmnber  of  cases,  5.     The  first  number  under  the 
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radical  sign,  214,  is  the  sum  of  the  squares  of  the  items  in 
the  subject  series.  The  fraction,  ^^,  does  not  seem  at 
first  to  carry  out  the  directions  in  the  second  principle 
stated  above  which  says  to  subtract  from  the  sum  of  the 
squares  of  the  numbers  the  product  found  by  multiplying 
the  square  of  the  mean  of  the  series,  36,  by  the  number  of 
cases,  or  5.  The  mean  is  6,  or  ^^,  which  when  squared  =  36, 
or  s^-,  and  when  multiplied  by  the  number  of  cases  gives 
180,  or  ^^.  The  other  numbers  under  the  radical  are 
found  in  the  same  way. 

(6)  Data  Complex  and  Grouped  in  Class  Intervals. — ^We 
shall  now  illustrate  the  computation  of  the  coefficient  of  cor- 
relation with  the  data  grouped  in  class  intervals.  To  show  the 
similarity  of  this  method  to  the  one  used  jn  Table  XXXVIII 
we  shall  use  the  same  data  that  were  used  in  that  table. 

The  data  should  be  arranged  in  a  correlation  table  the 
same  as  in  Table  XXXVIII.  At  the  left  of  the  table  (see 
Table  XXXIX),  and  at  the  bottom,  instead  of  inserting  the 
class  intervals  40  to  44.99,  45  to  49.99,  etc.,  as  was  done  in 
Table  XXXVIII,  insert  the  numbers  1,  2,  3,  etc.,  in  the 
column  marked  S.  In  order  to  keep  the  data  straight,  the 
class  intervals  may  be  inserted  if  desired  as  in  Table 
XXXVIII  and  the  numbers  in  column  S  inserted  later. 
It  should  be  noted  that  multiplying  the  frequencies  by  the 
numbers  1,  2,  3,  etc.,  gives  them  the  same  relative  values 
as  if  they  were  multiplied  by  the  mid-points  of  the  class 
intervals  42.5,  47.5,  52.5,  etc.  In  like  manner  insert  the 
figures  1,  2,  3,  etc.,  in  the  margin  at  the  top  of  the  table. 
When  data  are  grouped  in  class  intervals,  we  may  assume 
that  they  are  grouped  at  the  mid-point  of  the  class  interval. 
Thus,  in  the  first  row  at  the  top,  the  class  interval  is  95  to  100 
(see  Table  XXXVIII),  and  all  measures  are  assumed  to  be 
grouped  at  the  mid-point,  or  at  97.5.  The  class  interval  in 
the  second  row  is  90  to  94.99,  and  its  mid-point  is  92.5. 
Therefore  all  measures  in  the  top  row  may  be  said  to  have  a 
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value  of  97.5  and  in  the  next  row,  92.5,  and  so  on.  Similarly 
the  measures  in  the  columns  may  be  considered  as  grouped  at 
their  mid-points  as  in  Table  XXXVUI.  The  first  column 
to  the  left  would  then  have  a  value  62.5,  the  second  one 
67.5,  and  so  on.  To  compute  the  coefficient  of  correla- 
tion by  this  method  it  is  necessary  to  multiply  the  total 
number  of  measures  in  the  rows  and  columns  by  their 
magnitude.  Thus,  to  get  the  >S7'-column  at  the  right  of 
the  table,  we  accordingly  would  have  to  multiply  the  total 
number  of  measures  in  the  first  row,  summed  in  the  T- 
column,  by  the  magnitude  of  the  measures,  which  in  this 
case  would  be  97.5,  In  order  to  get  the  measures  in  the 
SST-Golmmi  we  would  have  to  square  the  97.5  and  multiply 
it  by  the  sum  of  the  measures  in  the  first  row,  or  32.  But 
in  order  to  avoid  the  squaring  of  large  numbers  and  also 
the  multiplication  by  large  numbers,  we  insert  the  num- 
bers 1,  2,  3,  etc.,  instead  of  the  mid-values  of  the  class 
intervals. 

The  T,  RT,  and  RRT  rows  at  the  bottom  are  found  in  the 
same  way  as  the  T,  ST,  and  SST  columns  at  the  right. 
(Use  Table  I,  Appendix,  for  squaring  the  numbers.) 

The  ZSf-row  at  the  bottom  of  the  table  is  found  by  multi- 
plying the  frequencies  /  by  the  subject  series,  S,  finding  their 
sum  and  recording  in  the  proper  squares  below.  Thus  in 
the  first  column  3X2  =  6;  in  the  second  column,  12X5  =  60; 
11X6  =  66,  5X14  =  70;  4X12=48;  604-66+70+48  =  244. 
The  other  numbers  in  the  row  are  found  in  a  similar  way. 

The  R  (2S/)-row  is  found  by  multiplying  the  numbers 
in  the  2 Sf -vow  by  R,  the  relative  series.  Thus  1X6  =  6; 
2X244=488;  3X132  =  396,  etc. 

In  the  computation  below  the  diagram  the  numbers  are 
taken  from  the  totals  at  the  right  and  the  bottom  of  the 
columns  and  rows  respectively.  Substituting  the  values  in 
the  Pearson  formula  we  have,  f  =  0.107,  the  same  as  by  the 
other  method  on  page  334. 
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Table  XXXIX. — Illustrating  the  Computation  of  the  Coeffi- 
cient OF  Correlation  by  the  Short  Method 

(Adapted  from  Ayres) 

Subject  Series 


%• 

2 

3 

4 

5 

6 

7 

8 

T 

ST 

SST 

12 

5 

20 

7 

32 

384 

4,608 

11 

6 

12 

18 

198 

2,178 

10 

35 

5 

40 

400 

4,000 

9 

20 

12 

32 

288 

2,592 

8 

11 

11 

88 

704 

7 

79 

1 

80 

560 

3,920 

6 

0 

0 

0 

M 
V 

•c 

5 

14 

12 

26 

130 

650 

> 

4 

12 

27 

39 

156 

624 

3 

2 

2 

6 

18 

2 

0 

0 

0 

1 

30 

30 

30 

30 

T 

2 

37 

12 

102 

86 

47 

19 

5 

310 

2,240 

19,324 

RT 

2 

74 

36 

408 

430 

282 

133 

40 

1,405 

2,240 

=  7.226 
=  4.532 

310 
1,405 

RRT 

2 

148 

108 

1,632 

2,150 

1,692 

931 

320 

6,983 

310 

XSf 

6 

244 

132 

701 

627 

288 

192 

50 

i2(s5/) 

6 

488 

396 

2,804 

3,135 

1,728 

1,344 

400 

10,301 

2,240X4.532  =  10,151.68  10,301-10,151.68=    150 

2,240X7.226  =  16,186.24  19,324-16,186.24  =  3,138 

1,405X4.532=  6,367.46  6,983-  6,367.46=    616 


150 


V3,138X616 


=  0.107 
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We  may  fm-ther  clarify  the  process  by  comparing  the 
steps  here  with  the  simpler  problem  solved  by  the  Ayres 
short  method  on  page  340.  The  fraction  -2^*^  =  7.226,  the 
mean  of  the  relative  series.  -^^,^=4.532,  the  mean  of  the 
subject  series.  It  should  be  noted  that  in  each  case  it  is 
the  sum  of  the  values  of  the  individual  measures  divided  by 
the  number  of  cases.  10,301  =  S(>SXi^)  in  the  general 
formula  and  corresponds  to  the  308  in  the  simple  problem. 

2,240X4.532=10,151.68  =  ^^^^  of  the  general  formula; 

y  r> 

the  2,240  being  ZS  and  the  4.532  being  the  -^.     10,301- 

10,151.68  =  150,  the  simplified  numerator  of  the  fraction 
corresponding  to  38  in  the  simple  problem.  6,983  =  2*S^  in 
the  denominator  of  the  general  formula  and  19,324  =  27^-; 

2,440X7.226  =  16,186.24  =  -^  and   1,405X4.532=6,367.46 

N  ' 

4.  Representing-  the  DegTee  of  Correlation  Between 
Two  Traits  by  the  Graphic  Method:  (a) Data  Simple  and 
Ungrouped. — In  making  a  correlation  table  and  inserting 
the  values  we  must,  of  course,  deal  with  each  pair  of  cor- 
related values  separately  in  order  to  locate  them  in  the 
right  place  in  the  correlation  table.  In  plotting  pairs  of 
measures  we  construct  two  coordinate  axes,  one  horizontal 
(OX),  and  the  other  vertical  {OY),  meeting  at  an  origin,  or 
beginning  point,  at  the  bottom  and  left.  On  these  axes 
we  lay  off  scales  representing  the  traits  in  question.  The 
scales  need  not  be  made  up  of  the  same  units.  If  we  desire 
to  find  the  correlation  between  the  heights  and  weights  of 
individuals,  for  instance,  we  would  lay  off  one  scale  on,  say, 
the  OX-axis,  to  represent  height  and  the  other  on  the  OY- 
axis  to  represent  weight.  The  units  of  the  scale  are  laid 
off  from  the  origin  to  the  right  on  OX  and  upward  on  OY. 
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When  one  desires  to  insert  a  pair  of  measures  in  a  correla- 
tion table  thus  constructed,  he  finds  the  proper  place  on  the 
scale  on  the  X-axis  for  one  trait  and  the  proper  place  on  the 
F-axis  for  the  corresponding  trait.  He  then  erects  per- 
pendiculars at  each  of  the  points  thus  located  and  at  the 
intersection  of  these  perpendiculars  a  dot  or  small  cross  is 
made,  which  represents  the  pair  of  values  in  the  correlation 


Figure  X. 


ILLUSTRATING  THE  DISTRIBUTION  OF  CORRELATED  ABILITIES 
IN  LATIN  AND  MATHEMATICS 


table.  In  Table  XL  below  are  given  the  grades  made  by 
15  pupils  in  mathematics  and  Latin.  Figure  X  shows  the 
location  of  these  15  pairs  of  values  in  a  correlation  table. 
Pupil  A  made  a  grade  of  95  per  cent  in  mathematics  and  93 
per  cent  in  Latin.  In  inserting  these  values  we  find  the 
place  on  the  X-axis  that  is  marked  95  and  erect  a  perpen- 
dicular to  the  X-axis  at  that  point.     Similarly,  we  find  the 
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point  on  the  F-axis  that  corresponds  to  a  grade  of  93  per 
cent  and  erect  a  perpendicular  to  the  F-axis  at  that  point. 
At  the  intersection  of  these  two  perpendiculars  we  make  a 
small  cross  which  represents  the  pair  of  values  in  the  corre- 
lation table.  The  other  grades  are  similarly  located  in  the 
table. 

Table  XL. — Grades  Made  by  Fifteen  Pupils  in   Mathematics 

AND  Latin 


Pupil 

Mathematics 

Latin 

A 

95 

93 

B 

93 

93 

C 

90 

94 

D 

88 

89 

£ 

87 

85 

F 

70 

95 

6 

80 

82 

H 

86 

85 

I 

81 

79 

J 

75 

70 

K 

70 

65 

L 

69 

71 

H 

65 

69 

N 

60 

60 

O 

55 

60 

Many  Pairs  of  Values  With  Data  Grouped  in  Class 
Intervals. — If  perpendiculars  were  erected  at  the  junction 
points  of  each  of  the  class  intervals  on  both  the  X-  and 
F-axes  in  Figure  X,  we  would  have  a  convenient  device  for 
tabulating  any  number  of  pairs  of  values  in  the  correlation 
table.  When  data  are  thus  grouped  in  class  intervals, 
we  make  the  same  assumption  we  did  in  the  computation 
of  the  mean  and  median,  namel}'-,  that  all  the  measures 
are  grouped  at  the  mid-point  of  the  class  interval.  Con- 
sidering any  one  square  in  the  distribution  table  we  assume 
that  the  measures  are  grouped  at  the  mid-point  of  the 
class  interval  for  Latin  abilities  and  also  for  mathematics 
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abilities,  hence  all  the  measures  in  a  square  are  considered 
to  be  grouped  at  the  mid-point  of  the  square.  When  the 
data  are  thus  recorded  in  the  correlation  table  we  count 
the  number  of  measures  in  each  square  and  insert  the  proper 
figure  to  represent  these  measures. 

We  are  now  ready  to  make  an  estimate  of  the  degi'ee  of 
correlation  existing  between  the  two  traits.  This  may  be 
done  by  drawing  a  line  that  most  closely  approximates  the 
general  scattering  of  the  pairs  of  measures  over  the  table. 
In  drawing  such  a  hne  we  must  take  cognizance  of  the 
number  of  measures  in  each  square.  The  line  OD  is  the  best 
fitting  line  for  the  15  pairs  of  values  in  Table  X  and  is  known 
as  the  correlation  line. 

It  is  evident  that  this  method  is  inaccurate,  because  we 
could  not  take  two  similar  correlation  tables  and  tell  which 
had  the  higher  degree  of  correlation,  for  the  position  of  the 
line  OD  is  only  estimated  and  not  determined  accurately. 
The  measures  might  be  rather  similar  in  "  scatter  "  but  one 
would  be  unable  to  tell  the  exact  degree  of  correlation  in 
either  table. 

In  order  to  represent  both  graphically  and  accurately 
the  degree  of  correlation  existing  between  two  traits,  we 
must  resort  to  a  mathematical  formula,  such  as  the  one 
devised  by  Pearson,  that  will  take  cognizance  of  the  exact 
value  of  each  measure  in  the  correlation  table.  A  line 
drawn  that  most  closely  approximates  the  general  scattering 
of  the  pairs  of  measures  in  the  table  will  bear  a  constant 

relation  to  the  X-axis,  which  is  expressed  by  the  fraction  -. 
But  the  ratio  -  is  the  tangent  of  the  angle  made  by  the  cor- 

X 

relation  line  and  the  X-axis  (measured  from  the  F-axis  if 
the  angle  is  more  than  45  degrees.)     We  may  take,  there- 

fore,  any  value  of  -  (which  will  be  a  value  between  0  and  (1) 
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and  with  a  table  of  natural  tangents  determine  the  number 
of  degrees  the  line  of  correlation  makes  with  the  X-axis. 
The  line  may  then  be  drawn  accurately  in  the  correlation 
table. 

We  may  further  illustrate  the  graphic  method  of  expres- 
sing correlation  by  referring  to  Figure  XI.  In  that  figure 
let  the  line  XOX'  be  the  mean  of  abilities  in  mathematics 
and  the  Une  Y'OY  the  mean  of  the  abilities  in  Latin.     Then 


Figure  XI 

it  is  evident  that  if  a  student  who  was  5  units  above  the 
mean  in  Latin  was  also  5  units  above  the  mean  in  mathe- 
matics, and  that  if  one  3  units  above  the  mean  in  Latin  was 
3  units  above  in  mathematics,  and  the  same  ratio  prevailed 
for  all  students  both  above  and  below  the  means  of  the  two 
abilities,  a  curve  plotted  from  these  data  would  be  a  straight 
line  and  would  bisect  the  angle  Y'OX'-,  that  is,  it  would  make 
an  angle  of  45  degrees  with  the  X-axis,  and  the  correlation 
would  be  perfect.     If,  however,  the  correlation  were  not 
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perfect,  then  the  line  would  make  some  other  angle  with 
the  X-axis,  depending  on  the  degree  of  correlation  between 
the  two  traits. 

If  the  correlation  were  zero,  the  line  would  take  the 
position  OY^  or  OX'.  This  would  mean  that  any  given 
change  in  a  y-vahie  would  be  accompanied  by  no  change 
m  an  a;-value  and  vice  versa.  If  the  correlation  were 
negative,  then  the  line  would  swing  to  the  left  of  the  line 
OY'  taking  any  position  as  ON'.  This  would  mean  that 
as  the  y-values  grow  larger  the  a;-values  would  grow  smaller. 
If  we  have  a  perfect  negative  correlation,  the  line  ON' 
would  make  an  angle  of  45  degrees  with  the  X-axis. 

.It  thus  may  be  seen  then  that  the  degree  of  correlation 
existing  between  two  traits  is  expressed  by  the  angle  that 
the  hne  of  correlation  makes  with  the  X-axis.  If  the  x- 
values  increase  more  rapidly  than  the  y-values,  the  line  of 
correlation  will  lie  in  the  angle  DOX'.  If  the  ^/-values 
increase  more  rapidly  than  the  x-values,  it  will  lie  in  the 
angle  DOY'.  It  should  be  noted  that,  since  the  Hues  XOX' 
and  YOY'  represent  the  means  of  the  two  traits,  any  x~ 
value  greater  than  the  mean  will  lie  to  the  right  of  line, 
YOY'  and  will  deviate  from  the  mean  positively,  while  a 
value  below  the  mean  will  He  to  the  left  of  the  line  YOY' 
and  will  be  a  negative  deviation.  In  like  manner  any  ?/- value 
above  the  mean  will  have  a  positive  deviation  from  the  mean 
and  will  lie  above  the  XOX'  line,  while  a  value  less  than 
the  mean  will  lie  below  XOX'.  The  signs  for  the  four 
quadrants  may,  therefore,  be  represented  thus: 


Second  Quadrant 

x=  — 

y  =  + 

First  Quadrant 

x  =  + 
y  =  + 

Third  Quadrant 

x=  — 

y=- 

Fourth  Quadrant 
x=  + 

y=- 
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When  the  correlation  was  not  perfect,  the  degree  of  corre- 
lation was  expressed  by  Galton  thus:  Draw  any  horizontal 
line  AC  cutting  the  Hne  of  perfect  correlation  OD  at  B  and 

AB 
the  line  ON  at  C.     Then  the  ratio  -7-?^  measures  the  amount 

of  correspondence  in  change  in  the  two  variables.     A  given 

change  in  the  size  of  y  is  accompanied  by  a  proportional 

change  in  the  size  of  X.    When  the  hne  ON  swings  to  the 

AB 
position  OD,  then  ^-^  =  1  and  the  correlation  is  perfect  and 

AB 

positive.     When  ON  takes  the  position  OY,  the  ratio  -jj^  = 

infinit}^  since  AC  will  equal  zero;  that  is,  a  given  change 
in  the  size  of  y  is  accompanied  by  no  change  in  the  size  of  x. 
When  the  hne  ON  swings  to  the  left  of  the  line  YOY',  the 
degree  of  correlation  is  measured  in  a  similar  way  from  that 

AB 

quadrant  but  the  correlation  is  negative.     The  ratio  -j^  is 

called  the  coefficient  of  correlation  and  is  denoted  by  r. 

We  shall  now  note  another  very  important  term  in  the 
comparison  of  traits.  Galton  found  in  measuring  the 
heights  of  individuals  that  if  a  group  of  parents  were  found 
to  be,  say  ^/-inches  above  or  below  the  mean  of  the  race  in 
stature  that  the  mean  stature  of  their  children  would  not 
deviate  ?/-inches  from  the  mean  of  the  race,  but  would 
deviate  only  2/3^-inches  above  or  below  the  mean  of  the 
race.  In  expressing  this  fact  Galton  said  that  the  off- 
spring tended  to  "  regress  "  towards  the  mean  of  the  race. 
Since  then  it  has  been  common  to  speak  of  the  line  of  means 
of  the  correlation  table  as  the  line  of  regression.  Since  there 
is  a  line  of  means  of  the  columns  and  also  one  for  the  rows, 
there  will,  of  course,  be  two  lines  of  regression.  The  reader 
should  note  that  the  regression  lines  are  not  the  same  as 
the  line  that  represents  the  coefficient  of  correlation.  Each 
is  expressed  by  a  different  equation,  as  will  be  shown  later. 
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Finding-  the  Equation  of  a  Straight  Line  of  Regression. 

— The  most  definite  way  to  describe  a  line  is  to  write  its 
equation.  Since  the  relationship  of  most  of  the  educational 
data  maj^  be  expressed  by  a  straight  Hne,  we  shaU  note  how 
to  write  the  equation  of  a  straight  line.  In  order  to  do  this 
we  must  be  able  to  put  two  variables  as  x  and  y  together  in 
an  algebraic  equation  in  such  a  way  that  a  given  change 
in  the  value  of  one  is  accompaniesd  by  a  proportional  change 


P^-) — J — ID 


FiGUKE  XII 


in  the  value  of  the  other.  Let  us  write  the  equation  of  the 
Ime  PQ  in  Figure  XII.  The  equation  of  a  line  may  be 
written  if  we  know  the  value  of  any  two  points  on  it.  From 
Figure  XII  we  note  that  point  Q  has  an  x-value  of  +5  and 
a  y-value  of  +1.  These  values  are  measured  from  point  0, 
the  origin  of  the  graph.  The  distance  QM  is  called  the 
ordinate  of  point  Q  and  the  distance  QN  is  called  the  abscissa 
of  point  Q.    The  points  (5,  1)    are  called  the   coordinates 
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of  Q,  and  in  like  manner  (—5,  +7)  are  called  the  coordinates 
of  P.  The  axes  are  spoken  of  as  coordinate  axes.  The 
equation  of  the  line  PQ  is  Sx-\-5y  =  20.  For  any  two  points 
on  the  line  PQ,  the  ratio  of  the  difference  of  their  ordinates 
to  the  difference  of  their  abscissas  may  be    expressed    as 

y^~y^^  It  is  evident  that  this  ratio  is  constant  for  all 
xi— 2:2 

points  on  the  line.  This  ratio  is  the  tangent  of  the  angle 
that  the  line  PQ  makes  with  the  line  CD.  (The  tangent  of 
an  angle  is  the  side  opposite  the  angle  divided  by  the  adjacent 

side.)     Since  the  ratio —  measures  the  incUnation  of 

a;i— 0:2 

the  Hne  PQ,  it  is  called  the  slope  of  the  line. 

Let  point  Q  have  an  a^-value  of  plus  5  and  a  y-value  of 

plus  1  (+5,  1)  and  point  P  have  an  x-value  of  minus  5  and 

a  y- value  of  plus  7  (—5,  +7).     Since  point  P  has  an  x  and 

y  value  and  point  Q  also,  the  points  may  be  designated 

P{xi,yi)  and  Q{x2,y2),  xi  and  X2  being  the  abscissas  of  points 

P  and  Q  respectively,  and  t/i  and  1/2,  the  ordinates.     The 

slope  of  the  line  PQ  equals  ^— — —.    Let  Pi{xy)  be  any  other 

X\—X2 

point  on  the  line  PQ.  Then  the  slope  of  P^P  will  be  ^^^, 
since  Pi,  P,  and  Q  are  on  one  line,  slope  PiP  =  slope  PQ. 
Hence  we  have  the  formula  ^— Ml  =  ?/l — ^.     Substituting  in 

X  —  Xi      Xi  — X2 

the  equation  the  values  for  the  coordinates  of  points  P 

and  Q,  we  have  ^^  =  r— H"  =  ^^  =  ~~T?^-    Clearing  frac- 
'  x-\-b     5—5     x-\-o     —10 

tions  we  have  3x+5y  =  20,  which  is  the  equation  of  the  line 

desired. 

We  now  desire  the  equation  of  a  line  that  will  take 

cognizance  of  the  "  scatteration  "  of  the  measures  from  the 

means  of  the  corresponding  rows  and  columns.     We  noted 

that  the  degree  of  correlation  is  measured  in  terms  of  the 
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relative  amounts  of  deviation  of  each  point  from  the  mean 
of  the  column  and  from  the  mean  of  the  row  in  which  it 
falls.  In  the  measurement  of  dispersion  from  the  means 
of  the  columns  and  the  rows  we  must  use  the  same  unit  of 
deviation,  if  we  desire  the  amounts  of  dispersion  to  be 
comparable.  The  value  of  standard  deviation  having  been 
demonstrated  in  Chapter  XI,  under  the  heading  of 
"  Measurement  of  Variabilit}'',"  as  the  best  measure  of 
dispersion,  it  is  therefore  employed  to  measure  dispersion 
in  writing  the  equation  of  the  line  of  regression. 

We  know  that  the  line  that  "  best  fits  "  the  means  of  the 
arrays  is  that  line  from  which  the  deviations  of  the  means 
are  the  least  possible.  Professor  Karl  Pearson,  who  devel- 
oped the  equation  of  the  line,  employed  the  method  of 
least  squares  (discussed  in  a  previous  chapter)  in  the  loca- 
tion of  this  line;  that  is,  he  located  the  line  in  such  a  position 
that  the  sum  of  the  squares  of  the  deviations  of  the  means 
of  the  arrays,  each  weighted  by  the  number  of  measures  in 
the  respective  arrays,  would  be  the  minimum. 

Pearson's  Equation  for  a  Line  of  Regression. — ^Pearson 
deduced  the  equation  of  the  '"'  best  fitting  "  hne  as: 

yi-y  =  r-(xi-x). 

ax 
Meaning  of  Symbols  Used 

^=the  mean  of  the  columns 

z  =the  mean  of  the  rows 
?/i  =a  particular  measure  in  the  columns 
Xi  =&  particular  measure  in  the  rows 

AB 

r    =the  coefficient  of  correlation  designated  by  -r-^  i^i  figure 

o-y=the  standard  deviation  of  the  y  series 
cTx  =the  standard  deviation  of  the  x  series 

The  equation  may  be  written  in  a  condensed  form  as 
y  =  r-x, 

(Xx 
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in  which  y  =  the  deviation  of  a  particular  ^/-measure  from 
its  mean,  ovy  =  yi-y.     Similarly  x  =  Xi-x. 

We  noted  above  that  the  slope  of  the  Hne  was  repre- 
sented by  the  ratio  -  if  the  line  passes  through  the  origin  of 

X 

the  graph.     Let  this  value  be  represented  by  m;  that  is, 
m = -  and  y  =  trvx.    We  also  note  that 

X 

(7, 

y  =  r—x 

Therefore, 

Cj/  -,  <yy 

mx  =  r—  X,  and  m  =  r  — 

(Tx  '^x 


<r„ 


The  slope  of   the  line  is  therefore   represented  by  r  — . 

The  expression  r  —  is  known  as  the  regression  coefficient  of 

y  on  a;;  that  is,  the  deviation  of  y  corresponding  on  the  aver- 
age to  a  unit  change  in  the  type  of  x.     If  we  used  the  other 

regression  line,  the  slope  would  be  expressed  r  — ,  which  is 

the  regression  coefficient  of  x  on  y  and  means  that  deviation 
of  X  which  corresponds  to  a  unit  change  in  the  type  of  y. 
It  should  be  noted  that  in  a  perfect  correlation,  that  is, 
where  the  variability  of  the  two  traits  is  the  same,  there 
are  no  regression  lines  and  m  =  r. 

Let  us  see  what  regression  coefficients  mean  m  the 
problem  we  have  solved  above,  Table  XXXVIII. 

There  it  was  found  that  r= 0.107    (rx=1.40S    0-^=3.181. 

r-"  =  0  107  ^4§K  =  0.107X2.359  =  0.241   regression  coeffi- 
(Xx  1.408 

cient  of  y  on  x. 

r- =  0.107  J4et=0-047  regression  coefficient  of  x  on  y. 
Cy  3.181 

7/  =  0.241a;,  which  means  that  for  eveiy^unit  deviation 
from  the  type  of  x  (ability  in  mathematics)  it  is  most  prob- 
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able  that  there  will  be  an  accompanying  deviation  of  0.241 
as  much  in  y  (ability  in  Latin) . 

a:  =  0.047?/,  which  means  that  for  every  unit  of  deviation 
from  the  type  of  y  (ability  in  Latin)  it  is  most  probable  that 
there  will  be  an  accompanying  deviation  of  0.047a;  (ability 
in  mathematics). 

The  correlation  coefficient  is  the  geometric  mean  of  the 
two  regression  coefficients.  That  is,  the  square  of  the  cor- 
relation coefficient  is  equal  to  the  product  of  the  two  regres- 
sion coefficients.  Let  us  represent  the  regression  coefficients 
by  Pi  (rho)  and  P2  respectively.  Then  r^  =  PiXp2-  This 
serves  as  a  valuable  check  on  the  computation  of  these 
coefficients.  In  the  problem  cited  above  r^  =  0.114  and 
Pi Xp2  =  0.113.  The  difference  is  due  to  dropping  small 
fractions  in  the  computation. 

The  regression  coefficients  may  be  taken  directly  from 
the  values  computed  in  the  Ayres  Short  Method  also.     In 

1  ^0 
Table  XXXIX  it  was  shown  that  r=    ,  ==0.107. 

V3, 138X616 

The  fraction  ^^^^  is  the  regression  coefficient  of  x  on  y  and 

the  fraction  |^  is  the  regression  coefficient  of  y  on  x.     The 

following  computation  shows  that  these  fractions  are  equal 

to  the  regression  coefficients.     Let  the  regression  coefficient 

r—  be  represented  by  pi .     Then  since  r  =  -^ — , 

(Ty  IS  ax(Ty 


^xy  ^a^  ^xy  ^^  N      llxy       150 


r^ 


N<rx<ry    a,  /2a;2   j^y2         j^y2     2r    3,138 

In  Hke  manner  it  may  be  shown  that  P2  is  equal  to  the  fraction 


616- 


Coefficients  of  regressions  may  be  made  to  yield  valu- 
able information  in  many  ways.  They  may  be  used  in 
studying  such  questions  as  regularity  of  attendance  and 
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promotion  rates;  progress  through  the  grades  and  expendi- 
ture for  supervision;  time  spent  on  spelling  drills  and 
scores  made  in  spelling  ability;  years  of  schooling  and 
earnings  up  to  a  certain  age.  In  fact,  coefficients  of  regres- 
sions may  be  made  to  yield  very  valuable  information  along 
a  great  many  school  lines.  In  the  correlation  between 
Latin  and  mathematics  (cited  above)  we  may  use  regres- 
sion coefficients  to  gain  information  of  the  following  type. 

Suppose  a  student  makes  a  grade  of  85  per  cent  in  mathe- 
matics, what  is  the  most  probable  grade  he  will  make  in 
Latin?  From  the  computation  of  the  correlation  and 
regression  coefficients  in  Table  XXXVIII  we  have  the 
following  data: 

0 .  107  =  r,  the  coefficient  of  correlation, 
1.408=     the  standard  deviation  of  the  mathe- 
matics or  subject  series, 
3.181=     the  standard  deviation  of  the  Latin  or 
relative  series; 

73.09  =     the  mean  of  the  Latin  series; 

80 .  16  =  the  mean  of  the  mathematics  series. 
We  also  have  2/  =  0.241a;  which  is  the  equation  for  the 
regression  of  y  on  x.  Since  the  mean  of  the  subject  series  is 
80.16,  a  student  who  gets  a  grade  of  85  in  mathematics 
would  be  4.84  above  the  mean.  In  Figure  XIV  we  have 
the  means  of  the  two  series  with  the  line  ST  passing  through 
the  point  of  their  intersection.  The  hne  ST  is  the  hne 
whose  equation  is  ^  =  0 .  241a;.  Since  this  Hne  passes  through 
the  origin  of  the  graph  (the  intersection  of  the  two  lines  of 
means)  we  may  assume  any  values  for  x  and  compute  the 
corresponding  values  for  y.  If  we  assume  x  to  be  1,  then 
2/ =  0.241.  This  means  that  as  the  re- values  increase  one 
unit  above  the  mean  of  the  subject  series,  that  is,  the  YOY' 
axis,  the  y-values  will  increase  0.241  as  much  above  the 
mean  of  the  relative  series,  that  is,  the  XOX'  axis.  When, 
therefore,  we  assume  a  value  of  85  for  x  which  is  4 .  84  above 
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the  mean  of  the  subject  series,  the  value  of  y  will  be  0.241 
as  much  above  the  mean  of  the  relative  series.  That  is 
0.241X4.84  =  1.166  which,  when  added  to  the  mean  of 
the  Latin  series,  7 .  309  =  74 .  26,  which  is  the  most  probable 
grade  that  would  be  made  in  Latin. 

The  general  procedure  for  predicting  ability  in  one  trait 
when  the  correlation  between  two  traits  and  abiHty  in  one 
are  given  is  as  follows:  Let  the  trait  in  which  the  abihty  is 
given  be  the  subject  series.  Compute  the  means  of  both 
series  and  write  the  equation  for  the  regression  line  using 
the  regression  coefficient  for  y  on  x.  Assume  any  value  for 
X,  as  85  in  the  problem  above.  Determine  how  much  this 
is  above  or  below  the  a;-mean. 

Then,  in  the  equation  for  the  regression  line,  assume  x  to 
be  1  and  compute  the  value  for  y  which  is  0 .  241  in  the  prob- 
lem above.  Multiply  the  difference  between  the  mean  for 
the  rc-series  and  the  assumed  x-value  (85—80.16)  by  the 
value  of  y  and  add  the  product  to  the  i/-mean. 

It  should  be  noted  also  that  the  approximate  values  may 
be  read  directly  from  the  graph  of  the  regression  lines. 
From  Figure  XIV  we  note  that  as  we  move  5  units  to  the 
right  of  the  subject  mean  the  regression  hne  has  risen  0.241 
as  much  above  the  relative  mean.  We  may  therefore 
assume  any  value  for  x  and  read  the  corresponding  values 
for  y  directly  from  the  graph. 

Assuming  values  for  the  mathematics  series  from  80  to 
100,  increasing  steps  of  5,  we  may  accordingly  write  the 
corresponding  values  for  y  as  follows: 

The  Regression  of  ?/  on  a; 
Assumed  Corresponding 

x-values  y-values 

80       73.05 

85       74.26 

90       75.46 

95       76.67 

100       78.87 
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If  it  is  desired  to  assume  values  for  the  Latin  in  the  above 
problem  and  compute  the  corresponding  mathenaatics 
grades,  we  use  the  regression  of  a;  on  ^  and  the  equation  for 
the  other  regression  line,  which  is,  x  =  0. 047?/. 

The  hne  PQ  (Fig.  XIV)  is  the  line  desired.  It  is  meas- 
ured from  the  YOY'  axis  because  we  desire  the  regression  of 
X  on  y.  It  should  be  noted  that  we  can  assume  any  value 
for  Latin  and  read  off  the  approximate  mathematics  grade 
directly  from  the  graph,  the  same  as  with  the  other  regression 
line. 

The  value  of  the  correlation  coefficient,  r,  is  always  the 
value  of  the  tangent  of  the  angle  that  the  correlation  line 
makes  with  the  X-axis  where  it  passes  through  the  point  of 
intersection  of  the  X  and  Y  axes.  It  is  never  equal  to  the 
regression  lines.  If  the  correlation  is  perfect,  there  is  no 
regression,  hence  no  regression  lines.  In  the  above  prob- 
lem r= 0.107.  Referring  to  a  trigonometric  table  of  natural 
tangents  we  find  that  0.107  is  the  tangent  of  the  angle  6°  7'. 

Therefore,  if  we  desired  to  draw  the  correlation  line  in 
the  above  correlation  Table  XXXVIII,  it  would  make  an 
angle  of  6°  7'  with  the  X-axis. 

The  significance  of  the  tangent  in  determining  the  slope 
of  the  correlation  line  and  also  the  reasons  why  the  value 
of  r,  the  coefficient  of  correlation,  may  vary  from  —  1 
through  0  to  +1  may  be  made  clearer  by  geometry  in  the 
following  illustration. 

About  0  as  a  center,  describe  a  circle  with  a  radius  equal 
to  1. 

Let  XOX'  be  a  diameter  of  the  circle  and  YOY'  another 
diameter  perpendicular  to  it.  Let  CX'  be  a  geometrical 
tangent  to  the  circle  0  drawn  perpendicular  to  the  diameter 
XOX',  and  let  OC  be  a  line  bisecting  the  angle  YOX'  and 
intersecting  the  tangent  CX'  at  C.  Then,  by  geometry,  we 
know  that  the  triangle  COX'  is  a  right  angle  triangle,  that 
the  angles  COX'  and  X'CO  are  each  45°,  and  that  the  side 
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CX'  =  the  side  OX'.  By  construction  0Z'=1.  The  line 
CX'  therefore,  equals  1,  and  since  the  line  CX'  is  tangent 
to  the  circle  at  X' ,  the  numerical  value  of  such  a  tangent, 
limited  by  the  point  of  intersection  of  a  line  forming  an 
angle  of  45**  with  the  horizontal  axis,  is  1. 

From  0  draw  the  hne  OC,  making  the  angle  COX'  equal 
to  6°  1'.    Now  since  CX'  =  OX'  =  l,  and  since  the  value  of 


Figure  XIII 


the  coefficient  of  correlation,  is  0.107,  then  the  value  of 

CX'     10  7 
CX'  as  a  percentage  of  1  is  (TYr  —  TKrr  or  0.107.     Now  it  is 

evident  that  if  the  angle  COX'  were  zero,  the  tangent  would 
be  zero  and  hence  r  would  equal  zero.  It  is  also  evident 
that  r  can  never  be  greater  than  1  because  when  the  angle 
becomes  greater  than  45°  we  measure  the  tangent  from  the 
angle  made  by  the  correlation  line  and  the  line  YOY.     If 
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the  correlation  line  should  swing  to  the  left  of  YOY'  into 
the  second  quadrant,  then  r  would  be  negative  because  the 
x-values  would  grow  less  as  the  ^-values  increased.  The 
tangent  would  reach  its  maximum  length  when  the  correla- 
tion line  reached  135°  from  point  X'  on  the  circle  and  would 
have  a  value  of  —1.  Hence  the  values  of  r  will  vary  from 
+1  through  0  to  -1. 

Figure  XIV  shows  the  correlation  line  and  the  two 
regression  lines  for  the  data  in  Table  XXXVIII.  In  that 
table  the  mean  for  the  mathematics  series  is  80,16  and  is 
represented  in  Figure  XIII  by  the  Hne  YOY'.  The  mean 
for  the  Latin  series  is  73,09  and  is  represented  by  the  line 
XOX'.  The  line  MN  is  the  correlation  line  and  is  drawn 
so  as  to  make  an  angle  of  6°  7'  with  the  x-axis.  The  line 
CD  is  the  line  of  perfect  correlation  and  makes  an  angle  of 
45°  with  the  X-axis.  The  two  regression  Hnes  are  ST,  the 
regression  of  y  on  x,  which  makes  an  angle  of  13°  33'  with 
the  X-axis,  and  PQ,  the  regression  of  xon  y  which  makes  an 
angle  of  2°  42'  with  the  7-axis. 

The  Reliability  of  the  Correlation  Coeflacient. — We 
found  the  coefficient  of  correlation  between  abilities  in 
Latin  and  in  mathematics  to  be  0.107.  The  number  of 
cases  taken  to  get  this  value  was  310.  This  number  repre- 
sents but  a  small  part  of  the  people  in  high  school  taking 
Latin  and  mathematics.  The  question  arises :  If  we  were  to 
take  310  other  cases  and  compute  the  correlation  coefficient, 
would  it  be  0.107?  Or,  if  we  took  all  the  students  taking  Latin 
and  mathematics  in  high  schools  and  computed  the  corre- 
lation between  the  two  abilities,  would  it  still  be  0,107? 
We  cannot  answer  these  questions  dogmatically  and  say 
the  coefficient  of  correlation  thus  computed  would,  or  would 
not,  be  0.107,  but  we  can  apply  the  laws  of  "  chance  " 
and  speak  rather  dogmatically  from  the  facts  gained.  We 
know  that  the  reliability  of  the  coefficient  of  correlation, 
just  as  the  reliabiUty  of  the  mean  or  of  the  standard  devia- 
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tion,  depends  on  the  normality  of  the  distributions  in 
f|uestion.  If  the  distributions  are  approximately  normal, 
the  coeflficient  of  correlation  will  be  fairly  reliable.     On  the 
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Figure  XIV.     showing  the  position  op  the  correlation  line  and 

REGRESSION  LINES  FOR  DATA  IN  TABLE  XXXVIU 


other  hand,  if  the  distributions  do  not  approach  normaHty, 

the  rehabihty  of  the  coefficient  of  correlation  becomes  less. 

When   the   distributions   resemble   normality,    then   the 
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probable  error  (P.E.)  may  be  used  to  estimate  the  probable 
siabiliiy  of  the  coefficient.  We  find  from  the  normal  prob- 
ability cm-ve  that  P.E.  =  0.6745  cr. 

The  formula  for  the  probable  error  of  the  coefficient  of  cor- 
relation is 

l-r2 


P.E.,  =  0.6745 


Vn 


This  formula  shows  that  the  reliability  of  r  increases  as  N 
increases;  not  directly,  but  in  proportion  to  the  square  of 
the  number  of  cases.  This  is  true,  of  course,  because  when 
N  is  large,  the  value  of  P.E.^  becomes  less.  Therefore,  to 
double  the  reliability  of  a  coefficient  we  must  take  four 
times  the  number  of  cases;  to  triple  the  reliabihty  we  must 
take  nine  times  the  number  of  cases,  and  so  on.  For  r  to 
be  considered  fairly  reliable  it  should  be  at  least  three  times 
as  large  as  the  probable  error,  on  the  ground  that  it  is  very 
improbable  that  the  true  value  of  r  falls  outside  r±3  P.E. 
Applying  this  formula  to  Table  XXXVIII,  the  correlation 
between  abihties  in  Latin  and  mathematics,  we  have 
p  0.6745(1-0.107^)^^^3^ 

V310 

or  r  =  0.107 dbO.037.  Here  the  correlation  coefficient  is  a 
little  less  than  three  times  the  probable  error,  which  makes 
it  not  entirely  rehable  as  a  measure  of  correlation.  Whipple 
says:^  "In  general,  a  correlation,  hke  any  other  deter- 
mination, to  have  claim  to  scientific  attention  must  be  at 
least  twice  as  large  as  its  P.E.,  and  to  be  perfectly  satis- 
factory, should  be  perhaps  four  or  five  times  as  large." 

Spearman's  Method  of  Rank  Correlation. — Spearman's 
"  foot-rule  "  for  measuring  correlation  is  a  simple  method 
of  comparison  by  "  rank  "  or  "  position  "  rather  in  terms 
of  absolute  quantity.    This  method  is  becoming  popular 

^  Manual  of  Mental  and  Physical  Tests,  Part  I,  p.  40. 
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because  of  the  ease  of  computation.  It,  of  course,  is  less 
accurate  than  the  product-moment  method  by  Pearson. 
The  formula  for  this  method  is 

in  which  R  expresses  the  degree  of  rank  correlation  (not  to 
be  confused  with  r  in  the  Pearson  formula) ;  g  is  the  numeri- 
cal gains  in  rank  of  an  individual  in  the  second,  as  compared 
with  the  first  series;  and  N  is  the  number  _of  cases.  Table 
XLI  illustrates  the  computation  of  R. 

Table    XLI. — Correlation   Between   a   Class   in   Addition 
AND  Handwriting   Measured   by  the   Thorndike   Scale 

Hand- 
Pupil  Addition  writing  Gains,  g 

A 13 16 0 

B 11 14 0 

c 10 15 1 

D 9 11 0 

E 8 13 1 

F 7 9 0 

G 6 12 2 


6So  24 

Substituting  in  the  formula  R  =  l—  ^  _^    =  1 — j^  =  0 . 5. 

In  using  the  data  in  Table  XLI  to  illustrate  the  computa- 
tion of  the  degree  of  correlation  by  the  Spearman  Rank 
Method  we  proceed  as  follows : 

Rank  the  measures  in  each  series  in  the  order  of  their 
magnitude.  We  may  start  with  either  the  largest  or  small- 
est scores  in  ranking  them.  If  we  start  with  the  largest 
scores  in  one  series,  we  must  do  the  same  in  the  other,  and 
vice  versa.  Compute  the  amount  of  gain  in  rank  of  each 
score  value  in  the  second  series  over  the  rank  of  its  cor- 
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responding  score  in  the  first  series.  For  example:  pupils 
A,  B,  D,  and  F  did  not  gain  in  rank  in  handwriting  over  their 
respective  ranks  in  addition.  Pupil  c  ranks  third  in  addi- 
tion but  second  in  handwriting,  hence  his  gain  in  rank  is  1. 
Pupil  E  ranks  fifth  in  addition  and  fourth  in  handwriting, 
hence  gains  1.  Pupil  g  ranks  seventh  in  addition  and  fifth 
in  handwriting,  hence  gains  2.  The  sum  of  the  gains  is  4 
which  when  substituted  in  the  formula  gives  a  value  for  R 
equal  to  0.5. 

In  case  of  a  tie  in  the  rank  in  either  series  it  is  customary 
to  divide  the  ranks  in  such  a  manner  as  to  keep  the  total 
number  of  ranks  in  each  series  the  same.  If,  for  example, 
two  scores  ranked  fifth  each  should  be  assigned  a  value  of 
5.5  (that  is  one-half  of  5+6).  If  three  ranked  fifth,  they 
should  all  be  assigned  the  rank  of  6  (the  mean  of  the  fifth, 
sixth,  and  seventh  places). 

The  rank  method  should  be  used  only  when  iV  is  small, 
in  which  case  its  reliability  is  about  as  great  as  the  more 
accurate  product-moment  method. 
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Squares  axd  Square  Roots 


No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

1 

1 

1.000 

36 

12  96 

6.000 

71 

50  41 

8.426 

2 

4 

1.414 

37 

13  69 

6.083 

72 

51  84 

8.485 

3 

9 

1.732 

38 

14  44 

6.164 

73 

53  29 

8.544 

4 

16 

2.000 

39 

15  21 

6.245 

74 

54  76 

8.602 

5 

25 

2.236 

40 

16  00 

6.325 

75 

56  25 

8.660 

6 

36 

2.449 

41 

16  81 

6.403 

76 

57  76 

8.718 

7 

49 

2.648 

42 

17  64 

6.481 

77 

59  29 

8.775 

8 

64 

2 .  828 

43 

IS  49 

6.557 

78 

60  84 

8.832 

9 

81 

3.000 

44 

19  36 

6.633 

79 

62  41 

8.888 

10 

1  00 

3.162 

45 

20  25 

6.708 

80 

64  00 

8.944 

11 

1  21 

3.317 

46 

21  16 

6.782 

81 

65  61 

9.000 

12 

1  44 

3.484 

47 

22  09 

6.856 

82 

67  24 

9.055 

13 

1  69 

3.606 

48 

23  04 

6.928 

83 

68  89 

9.110 

14 

1  96 

3.742 

49 

24  01 

7.000 

84 

70  56 

9.165 

15 

2  25 

3.873 

60 

25  00 

7.071 

85 

72  25 

9.220 

16 

2  56 

4.000 

51 

26  01 

7.141 

86 

73  96 

9.274 

17 

2  89 

4.123 

52 

27  04 

7.211 

87 

75  69 

9.327 

18 

3  24 

4.243 

53 

28  09 

7.280 

88 

77  44 

9.381 

19 

3  61 

4.359 

54 

29  16 

7.348 

89 

79  21 

9.434 

20 

4  00 

4.472 

55 

30  25 

7.416 

90 

81  00 

9.487 

21 

4  41 

4.583 

56 

31  36 

7.483 

91 

82  81 

9.539 

22 

4  84 

4.690 

57 

32  49 

7.550 

92 

84  64 

9.592 

23 

5  29 

4.796 

58 

33  64 

7.616 

93 

86  49 

9.614 

24 

5  76 

4.899 

59 

34  81 

7.681 

94 

88  36 

9.695 

26 

6  25 

5.000 

60 

36  00 

7.746 

95 

90  25 

9.747 

26 

6  76 

5.099 

61 

37  21 

7.810 

96 

92  16 

9.798 

27 

7  29 

5.196 

62 

38  44 

7.874 

97 

94  09 

9.849 

28 

7  84 

5.292 

63 

39  69 

7.937 

98 

98  04 

9.899 

29 

8  41 

5.385 

64 

40  96 

8.000 

99 

98  01 

9.950 

30 

9  00 

5.477 

65 

42  25 

8.062 

100 

1  CO  00 

10.000 

31 

9  61 

5.568 

66 

43  56 

8.124 

101 

1  02  01 

10 . 050 

32 

10  24 

5.657 

67 

44  89 

8.185 

102 

1  04  04 

10.100 

33 

10  89 

5.745 

68 

46  24 

8.246 

lOS 

1  06  09 

10.149 

34 

11  56 

5.831 

69 

47  61 

8.307 

104 

1  08  16 

10 . 198 

36 

12  25 

5.916 

70 

49  00 

8.367 

105 

1  10  25 

10.247 

367 
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No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

106 

1  12  36 

10.296 

141 

1  98  81 

11.874 

176 

2  09  76 

13.266 

107 

1  14  49 

10.344 

142 

2  01  64 

11.916 

177 

3  13  29 

13.304 

108 

1  16  64 

10.392 

143 

2  04  49 

11.958 

178 

3  16  84 

13.342 

109 

1  18  81 

10.440 

144 

2  07  36i  12.0001 

179 

3  20  41 

13.379 

110 

1  21  00 

10.488 

146 

2  10  25 

12.042 

180 

3  24  00 

13.416 

111 

1  23  21 

10.536 

146 

2  13  16 

12.083 

181 

3  27  61 

13.454 

112 

1  25  44 

10.583 

147 

2  16  09 

12.124 

182 

2  31  24 

13.491 

113 

1  27  69 

10.630 

148 

2  19  04 

12.168 

183 

3  34  89 

13.528 

114 

1  29  96 

10.677 

149 

2  22  01 

12.207 

184 

3  38  56 

13.565 

116 

1  32  25 

10.724 

150 

2  25  00 

12.247 

185 

3  42  25 

13.601 

116 

1  34  56 

10.770 

151 

2  28  01 

12.288 

186 

3  45  96 

13 . 638 

117 

1  36  89 

10.817 

152 

2  31  04 

12.329 

187 

3  49  69 

13.675 

118 

1  39  24 

10.863 

153 

2  34  09 

12.369 

188 

3  53  44 

13.711 

119 

1  41  61 

10.909 

154 

2  37  16 

12.410 

189 

3  57  21 

13:748 

120 

1  44  00 

10.954 

166 

2  40  25 

12.450 

190 

3  61  00 

13.784 

121 

1  46  41 

11.000 

156 

2  43  36 

12.490 

191 

3  64  81 

13.820 

122 

1  48  84 

11.045 

167 

2  46  49 

12.530 

192 

3  68  64 

13.856 

123 

1  51  29 

11.091 

168 

2  49  64 

12.570 

193 

3  72  49 

13.892 

124 

1  53  76 

11.136 

169 

2  52  81 

12.610 

194 

3  76  36 

13.928 

126 

1  56  25 

11.180 

160 

2  56  00 

12.649 

195 

3  80  25 

13.964 

126 

1  58  76 

11.225 

161 

2  59  21 

12.689 

196 

3  84  16 

14.000 

127 

1  61  29 

11.269 

162 

2  62  44 

12.728 

197 

3  88  09 

14.030 

128 

1  63  84 

11.314 

163 

2  65  69 

12.767 

198 

3  92  04 

14.071 

129 

1  66  41 

11.358 

164 

2  68  96 

12.806 

199 

3  96  01 

14.107 

130 

1  69  00 

11.402 

166 

2  72  25 

12.845 

200 

4  00  00 

14.142 

131 

1  71  61 

11.446 

166 

2  75  56 

12.884 

201 

4  04  01 

14.177 

132 

1  74  24 

11.489 

167 

2  78  89 

12.923 

202 

4  08  04 

14.213 

133 

1  76  89 

11.533 

168 

2  82  24 

12.961 

203 

4  12  09 

14.248 

134 

1  79  56 

11.576 

169 

2  85  61 

13.000 

204 

4  16  16 

14.283 

136 

1  82  25 

11.619 

170 

2  89  00 

13.038 

206 

4  20  25 

14.318 

136 

1  84  96 

11.662 

171 

2  92  41 

13.077 

206 

4  24  36 

14.353 

137 

1  87  69 

11.705 

172 

2  95  84 

13.115 

207 

4  28  49 

14.387 

138 

1  90  44 

11.747 

173 

2  99  29 

13.153 

208 

4  32  64 

14.422 

139 

1  93  21 

11.790 

174 

3  02  76 

13.191 

209 

4  36  81 

14.457 

140 

1  96  00 

11.832 

176 

3  06  25 

13.229 

210 

4  41  00 

14.491 
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Square 

Square 
Root 

No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

____ 

211 

4  45  21 

14.526 

246 

6  05  16 

15.684 

281 

1 
7  89  61  16.763 

212 

4  49  44 

14.560 

247 

6  10  09 

15.716 

282 

7  95  24|l6.793 

213 

4  63  69 

14.595 

248 

6  15  04 

15.748 

283 

8  00  89 

16.823 

214 

4  57  96 

14.629 

249 

6  20  01 

15.780 

284 

8  06  56 

16.852 

215 

4  62  25 

14.663 

250 

6  25  00 

15.811 

285 

8  12  25 

16.882 

216 

4  68  56 

14.697 

251 

6  30  01 

15.843 

286 

8  17  96 

16.912 

217 

4  70  89 

14.731 

252 

6  35  04 

15.875 

287 

8  23  69 

16.941 

218 

4  75  24 

14.765 

253 

6  40  09 

15.906 

288 

8  29  44 

16.971 

219 

4  79  6114.7991 

254 

6  45  16 

15.937 

289 

8  35  21 

17.000 

220 

4  84  00 

14.832 

255 

6  50  25 

15.969 

290 

8  41  00 

17.029 

221 

4  88  41 

14.866 

256 

6  55  36 

16.000 

291 

8  46  81 

17.059 

222 

4  92  84 

14.900 

257 

6  60  49 

16.031 

292 

8  52  64117.088 

223 

4  97  29 

14.933 

258 

6  65  64 

16.062 

293 

8  58  49 

17.117 

224 

5  01  76 

14.967 

259 

6  70  81 

16.093 

294 

8  64  36 

17 . 146 

225 

5  06  25 

15.000 

260 

6  76  00 

16.125 

295 

8  70  25 

17 . 176 

226 

5  10  76 

15.033 

261 

6  81  21 

16.155 

296 

8  76  16 

17.205 

227 

5  15  29 

15.067 

262 

6  86  44 

16.186 

297 

8  82  09 

17.234 

228 

5  19  84 

15.100 

263 

6  91  69 

16.217 

298 

8  88  04 

17.263 

229 

5  24  41 

15.133 

264 

6  96  96 

16.248 

299 

8  94  01 

17.292 

230 

5  29  00 

15 . 186 

265 

7  02  25 

16.279 

300 

9  00  00 

17.321 

231 

5  33  61 

15.199 

266 

7  07  56 

16.310 

301 

9  06  01  17.349 

232 

5  38  24 

15.232 

267 

7  12  89 

16.340 

302 

9  12  04  17.378 

233 

5  42  89 

15.264 

268 

7  18  24 

16.371 

303 

9  18  09  17.407 

234 

5  47  56 

15.297 

269 

7  23  61 

16.401 

304 

9  24  16  17.436 

235 

5  52  25 

15.330 

270 

7  29  00 

16.432 

305 

9  30  25 

17.464 

236 

5  56  96 

15.362 

271 

7  34  41 

16.462 

306 

9  36  36 

17.493 

237 

5  61  69 

15.395 

272 

7  39  84 

16.492 

307 

9  42  49 

17.521 

238 

5  66  44 

15.427 

273 

7  45  29 

16.523 

308 

9  48  64 

17.550 

239 

5  71  21 

15.460 

274 

7  50  76 

16.553 

309 

9  54  81 

17.578 

240 

5  76  00 

15.492 

275 

7  56  25 

16.583 

310 

9  61  00 

17.607 

241 

5  80  81 

15.524 

276 

7  61  76 

16.613 

311 

9  67  21 

17.635 

242 

5  85  64 

15.556 

277 

7  67  29 

16.643 

312 

9  73  44 

17.664 

243 

5  90  49 

15.588 

278 

7  72  84 

16.673 

313 

9  79  69 

17.692 

244 

5  95  36 

15.620 

279 

7  78  41 

16.703 

314 

9  85  96 

17 . 720 

246 

6  00  25 

15.652 

280 

7  84  00 

16.733 

315 

9  92  25 

17.748 
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No. 

Square 

Square 
Root 

No. 

Sq 

uare 

Square 
Root 

No. 

Square 

Square 
Root 

316 

9  98  56 

17.776 

351 

12 

32  01 

18.735 

386 

14  89  96 

19.647 

317 

10  04  89 

17.804 

362 

12 

39  04 

18.762 

387 

14  97  69 

19.672 

318 

10  11  24 

17.833 

353 

12 

46  09 

18.788 

388 

15  05  44 

19.698 

319 

10  17  61 

17.861 

354 

12 

53  16 

18.815 

389 

15  13  21 

19.723 

320 

10  24  00 

17.889 

355 

12 

60  25 

18.841 

390 

15  21  00 

19.748 

321 

10  30  41 

17.916 

356 

12 

67  36 

18.868 

391 

15  28  81 

19.774 

322 

10  36  84 

17.944 

357 

12 

74  49 

18.894 

392 

15  36  64 

19.799 

323 

10  43  29 

17.972 

358 

12 

81  64 

18.921 

393 

15  44  49 

19.824 

324 

10  49  76 

18.000 

359 

12 

88  81 

18.947 

394 

15  52  36 

19.849 

325 

10  56  25 

18.028 

380 

12  96  00 

18.974 

395 

15  60  25 

19.875 

326 

10  62  76 

18.055 

361 

13 

03  21 

19.000 

396 

15  68  16 

19.900 

327 

10  69  29 

18.083 

382 

13 

10  44 

19.026 

397 

15  76  09 

19.925 

328 

10  75  84 

18.111 

363 

13 

17  69 

19.053 

398 

15  84  04 

19.950 

329 

10  82  41 

18.138 

364 

13 

24  96 

19.079 

389 

15  92  01 

19.975 

330 

10  89  00 

18.166 

365 

13 

32  25 

19.105 

400 

16  GO  00 

20.000 

331 

10  95  61 

18.193 

366 

13 

39  56 

19.131 

401 

16  08  01 

20.025 

S32 

11  02  24 

18.221 

367 

13 

46  89 

19.157 

402 

16  16  04 

20.050 

333 

11  08  89 

18.248 

333 

13 

54  24 

19.183 

403 

16  24  09 

20.075 

334 

-11  15  56 

18.276 

369 

13 

61  61 

19.209 

404 

16  32  16 

20.100 

335 

11  22  25 

18.303 

370 

13 

69  00 

19.235 

405 

16  40  25 

20.125 

336 

11  28  96 

18.330 

371 

13 

76  41 

19.261 

406 

16  48  36 

20.149 

337 

11  35  69 

18.358 

372 

13 

83  84 

19.287 

407 

16  56  49 

20.174 

338 

11  42  44 

18.385 

373 

13 

91  29 

19.313 

408 

16  64  64 

20.199 

339 

11  49  21 

18.412 

374 

13 

98  76 

19.339 

409 

16  72  81 

20.224 

340 

11  56  00 

18.439 

375 

14 

06  25 

19.365 

410 

16  81  00 

20.248 

S41 

11  62  81 

18.466 

376 

14 

13  76 

19.391 

411 

16  89  21 

20.273 

342 

11  69  64 

18.493 

377 

14 

21  29 

19.416 

412 

16  97  44 

20.298 

343 

11  76  49 

18.520 

378 

14 

28  84 

19.442 

413 

17  05  69 

20.322 

344 

11  83  36 

18.547 

379 

14 

36  41 

19.468 

414 

17  13  96 

20.347 

346 

11  90  25 

18.574 

380 

14 

44  00 

19.494 

416 

17  22  25 

20.372 

346 

11  97  16 

18.601 

381 

14 

51  61 

19.519 

416 

17  30  56 

20.396 

347 

12  04  09 

18.628 

382 

14 

59  24 

19.545 

417 

17  38  89 

20,421 

348 

12  11  04 

18.655 

383 

14 

66  89 

19.570 

418 

17  47  24 

20.445 

349 

12  18  01 

18.682 

384 

14 

74  56 

19.596 

419 

17  55  61 

20.469 

360 

12  25  GO 

18.708 

385 

14  82  25 

19.621 

420 

17  64  00 

20.494 
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No 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

421 

17  72  41 

20.518 

456 

20 

79  36 

21.354 

491 

24  10  81 

22.159 

422 

17  80  84 

20.543 

457 

20  88 

49 

21.378 

492 

24  20  64 

22.181 

423 

17  89  29 

20.567 

458 

21 

06 

81 

21.401 

493 

24  30  49 

22.204 

424 

17  97  76 

20.591 

459 

21 

16 

00 

21.424 

494 

24  40  36 

22.226 

425 

18  06  25 

20.610 

460 

21 

16 

00 

21.448 

495 

24  50  25 

22.249 

428 

18  14  76 

20.640 

461 

21 

25 

21 

21.471 

496 

24  60  16 

22.271 

427 

18  23  29 

20 . 664 

462 

21 

34 

44 

21.494 

497 

24  70  09 

22.293 

428 

18  31  84 

20.688 

463 

21 

43 

69 

21.517 

498 

24  80  04 

22.316 

429 

18  40  41 

20.712 

464 

21 

52 

96 

21.541 

499 

24  90  01 

22.338 

430 

18  49  00 

20.736 

465 

21 

62 

25 

21.564 

500 

25  00  00 

22.361 

431 

18  57  61 

20.761 

466 

21 

71 

56 

21 . 587 

501 

25  10  01 

22.383 

432 

18  66  24 

20.785 

467 

21 

80 

89 

21.610 

502 

25  20  04 

22.405 

433 

18  74  89 

20.809 

463 

21 

90  24 

21 . 633 

503 

25  30  09 

22.428 

434 

18  83  56 

20.833 

469 

21 

99 

61 

21 . 656 

504 

25  40  16 

22.450 

435 

18  92  25 

20.857 

470 

22 

09 

00 

21.679 

505 

25  50  25 

22.472 

436 

19  00  96 

20.881 

471 

22 

18 

41 

21.703 

506 

25  60  36 

22.494 

437 

19  09  69 

20.905 

472 

22 

27 

84 

21 . 726 

507 

25  70  49 

22.517 

438 

19  18  44 

20.928 

473 

22 

37 

29 

21 . 749 

508 

25  80  64 

22.539 

4SD 

19  27  21 

20.952 

474 

22 

46 

76 

21 . 772 

509 

25  90  81 

22.561 

440 

19  36  00 

20.976 

475 

22 

56 

25 

21.794 

510 

26  01  00 

22.583 

441 

19  44  81 

21.000 

476 

22 

65 

76 

21.817 

511 

26  11  21 

22.605 

442 

19  53  64 

21.024 

477 

22  75 

29 

21.840 

512 

26  21  44 

22.627 

443 

19  62  49 

21.048 

478 

22 

84 

84 

21.863 

513 

26  31  69 

22.650 

444 

19  71  36 

21.071 

479 

22 

94 

41 

21.886 

514 

26  41  96 

22.672 

445 

19  80  25 

21.095 

480 

23 

04 

00 

21.909 

515 

26  52  25 

22.694 

446 

19  89  16 

21.119 

481 

23 

13 

61 

21.932 

516 

26  62  56 

22.716 

447 

19  98  09 

21 . 142 

482 

22 

23 

24 

21.954 

517 

26  72  89 

22.738 

448 

20  07  04 

21.166 

483 

23 

32 

89 

21.977 

518 

26  83  24 

22.760 

449 

20  16  01 

21.190 

484 

23  42 

56 

22.000 

519 

26  93  61 

22.782 

450 

20  25  00 

21.213 

485 

23 

52 

25 

22.023 

520 

27  04  00 

22.804 

451 

20  34  01 

21.237 

486 

23 

61 

96 

22.045 

521 

27  14  41 

22.825 

452 

20  43  04 

21.260 

487 

23 

71 

69 

22.068 

522 

27  24  84 

22.847 

453 

20  52  09 

21.284 

488 

23 

81 

44 

22,091 

523 

27  35  29 

22.869 

454 

20  61  16 

21.307 

489 

23 

91 

21 

22.113 

524 

27  45  76 

22.891 

455 

20  70  25  21.3311 

1      1 

490 

24 

01 

00 

22.136 

526 

27  56  25 

22.913 
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No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

626 

27  66  76 

22.935 

561 

31  47  21 

23.685 

596 

35  52  16 

24.413 

627 

27  77  29 

22.956 

562 

31  58  44 

23.707 

597 

35  64  09 

24.434 

628 

27  87  84 

22.978 

563 

31  69  69 

23.728 

598 

35  76  04 

24.454 

529 

27  98  41 

23.000 

564 

31  80  96 

23.749 

599 

35  88  01 

24.474 

630 

28  09  00 

23.022 

565 

31  92  25 

23.770 

600 

36  00  00 

24.495 

531 

28  19  61 

23.043 

566 

32  03  56 

23.791 

601 

36  12  01 

24.515 

632 

28  30  24 

23.065 

567 

32  14  89 

23.812 

602 

36  24  04 

24.536 

533 

28  40  89 

23.087 

568 

32  26  24 

23.833 

603 

36  36  09 

24.556 

534 

28  51  56 

23.108 

569 

32  37  61 

23.854 

604 

38  48  16 

24.576 

535 

28  62  25 

23.130 

570 

32  49  00 

23.875 

605 

36  60  25 

24.597 

636 

28  72  96 

23.152 

571 

32  60  41 

23.896 

606 

36  72  36 

24.617 

537 

28  83  69 

23.173 

572 

32  71  84 

23.917 

607 

36  84  49 

24.637 

538 

28  94  44 

23.195 

573 

32  83  29 

23.937 

608 

36  96  64 

24.658 

539 

29  05  21 

23.216 

574 

32  94  76 

23.958 

609 

37  08  81 

24.678 

540 

29  16  00 

23.238 

576 

33  06  25 

23.979 

610 

37  21  00 

24.698 

641 

29  26  81 

23.259 

576 

33  17  76 

24.000 

611 

37  33  21 

24.718 

642 

29  37  64 

23.281 

577 

33  29  29 

24.021 

612 

37  45  44 

24.739 

643 

29  48  49 

23.302 

578 

33  40  84 

24.042 

613 

37  57  69 

24.759 

544 

29  59  36 

23.324 

579 

33  52  41 

24.062 

614 

37  69  96 

24.779 

645 

29  70  25 

23.345 

580 

33  64  00 

24.083 

615 

37  82  25 

24.799 

646 

29  81  16  23.367 

581 

33  75  61 

24.104 

616 

37  94  56 

24.819 

547 

29  92  09 

23.388 

582 

33  87  24 

24.125 

,617 

38  06  89 

24.839 

548 

30  03  04 

23.409 

583 

33  98  89 

24.145 

618 

38  19  24 

24.860 

549 

30  14  01 

23.431 

584 

34  10  56 

24.160 

619 

38  31  61 

24.880 

660 

30  25  00 

25.452 

585 

34  22  25 

24.187 

620 

38  44  00 

24.900 

651 

30  36  01 

23.473 

586 

34  33  96 

24.207 

621 

38  56  41 

24.920 

652 

30  47  04 

23.495 

587 

34  45  69 

24.228 

622 

38  68  84 

24.940 

553 

30  58  09 

23.516 

588 

34  57  44 

24.249 

623 

38  81  29 

24.960 

564 

30  69  10 

23.537 

589 

34  69  21 

24.269 

624 

38  93  70 

24.980 

655 

30  80  25 

23.558 

590 

34  81  00 

24.290 

625 

39  06  25 

25.000 

666 

30  91  36 

23.580 

591 

34  92  81 

24.310 

626 

39  18  76 

25.020 

657 

31  02  49 

23.601 

592 

35  04  64 

24.331 

627 

39  31  29 

25.040 

568 

31  13  64 

23.622 

593 

35  16  49 

24.3.52 

628 

39  43  84 

25.060 

569 

31  24  81 

23.643 

694 

35  28  36 

24.372 

629 

39  56  41  25.080 

660 

31  36  00 

23.664 

595 

35  40  25 

24.393 

630 

39  69  00  25.100 

SQUARES  AND  SQUARE  ROOTS 
Squares  and  Square  Roots — Continued 


373 


No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

631 

39  81  61 

25.120 

666 

44  35  56 

25.807 

701 

49 

14 

01 

26.476 

632 

39  94  24 

25.140 

667 

44  48  89 

25.826 

702 

49 

28 

04 

26.495 

633 

40  06  89 

25.159 

668 

44  62  24 

25.846 

703 

49 

42 

09 

26.514 

634 

40  19  56 

25.179 

669 

44  75  61 

25.865 

704 

49 

56 

16 

26.533 

635 

40  32  25 

25.199 

670 

44  89  00 

25.884 

705 

49  70  25 

26.552 

636 

40  44  96 

25.219 

671 

45  02  41 

25.904 

706 

49 

84 

36 

26.571 

637 

40  57  69 

25.239 

672 

45  15  84 

25.923 

707 

49 

98 

49 

26.589 

638 

40  70  44 

25.259 

673 

45  29  29 

25.942 

708 

50 

12 

64 

26.608 

639 

40  83  21 

25.278 

674 

45  42  76 

25.962 

709 

50 

26 

81 

26.627 

640 

40  96  00 

25.298 

675 

45  56  25 

25.981 

710 

50  41 

00 

26.646 

641 

41  08  81 

25.318 

676 

45  69  76 

26.000 

711 

50 

55 

21 

26.665 

642 

41  21  64 

25.338 

677 

45  83  29 

26.019 

712 

50 

69 

44 

26.683 

643 

41  34  49 

25.357 

678 

45  96  84 

26.038 

713 

50 

83 

69 

26.702 

644 

41  47  36 

25.377 

679 

46  10  41 

26.058 

714 

50 

97 

96 

26.721 

645 

41  60  25 

25.397 

680 

46  24  00 

26.077 

715 

51 

12 

25 

26.739 

646 

41  73  16 

25.417 

681 

46  37  61 

26.096 

716 

51 

26 

56 

26.758 

647 

41  86  09 

25.436 

682 

46  51  24 

26.115 

717 

51 

40 

89 

26.777 

648 

41  99  04 

25.456 

683 

46  64  89 

26.134 

718 

51 

55 

24 

26.796 

649 

42  12  01 

25.475 

684 

46  78  56}26.153| 

719 

51 

69 

61 

26.814 

650 

42  26  00 

25.495 

685 

46  92  25 

26.173 

720 

51 

84 

00 

26.833 

651 

42  38  01 

25.515 

686 

47  05  96 

26.192 

721 

51 

98 

41 

26.851 

652 

42  51  04 

25.534 

687 

47  19  69 

26.211 

722 

52 

12 

84 

26.870 

653 

42  64  09 

25.554 

688 

47  33  44 

26.230 

723 

52 

27 

29 

26.889 

654 

42  77  16 

25.593 

689 

47  47  21 

26.249 

724 

52 

41 

76 

26.907 

655 

42  90  25 

25.573 

690 

47  61  00 

26.268 

725 

52  56  25 

26.926 

656 

43  03  36 

25.612 

691 

47  74  81 

26.287 

726 

52 

70 

76 

26.944 

657 

43  16  49 

25.632 

692 

47  88  64 

26.306 

727 

52 

85 

29 

26.963 

658 

43  29  64 

25.652 

693 

48  02  49 

26.325 

728 

52 

99 

84 

26.981 

659 

43  42  81 

25.671 

694 

48  16  36 

26.344 

729 

53 

14  41 

27.000 

660 

43  56  00 

25.690 

695 

48  30  25 

26.363 

730 

53 

29 

00 

27.019 

661 

43  69  21 

25.710 

696 

48  44  16 

26.382 

731 

53 

43 

61 

27.037 

662 

43  82  44 

25.729 

697 

48  58  09 

26.401 

732 

53 

58 

24 

27.055 

663 

43  95  69 

25.749 

698 

48  72  04 

26.420 

733 

53 

72 

89 

27.074 

664 

44  08  96 

25.768 

699 

48  86  01 

26.439 

734 

53 

87 

56 

27.092 

665 

44  22  25 

25.788 

700 

49  00  00 

26.458 

735 

54 

02 

25 

27.111 
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No. 

Square 

Square 
Moot 

No. 

S'i-e  Sg^- 

No. 

Square  ' 

Square 
Root 

736 

54  16  96 

n .  129 

771 

59  44  41  • 

27.767 

806 

64  96  36 

28.390 

737 

54  31  69 

27 . 148 

772 

59  59  84 

27.785 

807 

65  12  49 

28.408 

738 

54  46  44 

27 . 166 

773 

59  75  29 

27.803 

808 

65  28  64 

28.425 

739 

54  61  21 

27 . 185 

774 

59  90  76 

27.821 

809 

65  44  81 

28.443 

740 

54  76  00 

27.203 

775 

60  06  25 

27.839 

810 

65  61  00 

28.460 

741 

54  90  81 

27.221 

776 

60  21  76 

27.857 

811 

65  77  21 

28.478 

742 

55  05  64 

27.240 

777 

60  37  29 

27.875 

812 

65  93  44 

28.496 

743 

55  20  49 

27.258 

778 

60  52  84 

27.893 

813 

66  09  69 

28.513 

744 

55  35  36 

27.276 

779 

60  68  41 

27.911 

814 

66  25  96 

28.531 

745 

55  50  25 

27.295 

780 

60  84  00 

27.928 

815 

66  42  25 

28.548 

746 

55  65  16 

27.313 

781 

60  99  61 

27.946 

816 

66  58  56 

28.566 

747 

55  80  09 

27.331 

782 

61  15  24 

27.964 

817 

66  74  89 

28.583 

748 

55  95  04 

27.350 

783 

61  30  89 

27.982 

818 

66  91  24 

28.601 

749 

56  10  01 

27.368 

784 

61  46  56 

28.000 

819 

67  07  61 

28.618 

750 

58  25  00 

27.386 

785 

61  62  25 

28.018 

820 

67  24  00 

28.636 

751 

56  40  01 

27.404 

786 

61  77  96 

28.036 

821 

67  40  41 

28.653 

752 

56  55  04 

27.423 

787 

61  93  69 

28.054 

822 

67  56  84 

28.671 

753 

56  70  09 

27.441 

788 

62  09  44 

28.071 

823 

67  73  29 

28.688 

754 

56  85  16 

27.459 

789 

62  25  21 

28.089 

824 

67  89  76 

28.705 

755 

57  00  25 

27.477 

790 

62  41  00 

28.107 

825 

68  06  25 

28.723 

756 

57  15  36 

27.495 

791 

62  56  81 

28.125 

826 

68  22  76 

28.740 

757 

57  30  49 

27.514 

792 

62  72  64 

28.142 

827 

68  39  29 

28.758 

758 

57  45  64 

27.532 

793 

62  88  49 

28.160 

828 

68  55  84 

28.775 

759 

57  60  81 

27.550 

794 

63  04  36 

28.178 

829 

68  72  41 

28.792 

760 

57  76  00 

27.568 

795 

63  20  25 

28.196 

830 

68  89  00 

28.810 

761 

57  91  21 

27.586 

796 

63  36  16 

28.213 

831 

69  05  61 

28.827 

762 

58  06  44 

27.604 

797 

63  52  09 

28.231 

832 

69  22  24 

28.844 

763 

58  21  69 

27.622 

798 

63  68  04 

28.249 

833 

69  38  89 

28.862 

764 

58  36  96 

27.641 

799 

63  84  01 

28.267 

834 

69  55  56 

28.879 

765 

58  52  25 

27.659 

800 

64  00  00 

28.284 

835 

69  72  25 

28.896 

766 

58  67  56 

27.677 

801 

64  16  01 

28.302 

836 

69  88  96 

28.914 

767 

58  82  89 

27.695 

802 

64  32  04 

28.320 

837 

70  05  69 

28.931 

768 

58  98  24 

27.713 

803 

64  48  09 

28.337 

838 

70  22  44 

28.948 

769 

59  13  61 

27.731 

804 

64  64  16 

28.355 

839 

70  39  21 

28.965 

770 

59  29  OO 

27.749 

806 

64  80  25 

28.373 

840 

70  56  00 

28.983 

i 
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No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

841 

70  72  81 

29.000 

876 

76  73  76 

29.597 

911 

82  99  21 

30.183 

842 

70  89  64 

29.017 

877 

76  91  29 

29.614 

912 

83  17  44 

30.199 

843 

71  06  49 

29.034 

878 

77  08  84 

29.631 

913 

83  35  69 

30.216 

844 

71  23  36 

29.052 

879 

77  26  41 

29.648 

914 

83  53  96 

30.232 

845 

71  40  25 

29.069 

880 

77  44  00 

29.665 

915 

83  72  25 

30.249 

846 

71  57  16 

29.086 

881 

77  61  61 

29.682 

916 

83  90  56 

30.265 

847 

71  74  09 

29 .  103 

882 

77  79  24 

29.698 

917 

84  08  89 

30.282 

848 

71  91  04 

29.120 

883 

77  96  89 

29.715 

918 

84  27  24 

30.299 

849 

72  08  01 

29.138 

884 

78  14  56 

29.732 

919 

84  45  61 

30.315 

850 

72  25  00 

29.155 

885 

78  32  25 

29.749 

920 

84  64  00 

30.332 

851 

72  42  01 

29.172 

886 

78  49  96 

29.766 

921 

84  82  41 

30.348 

852 

72  59  04 

29.189 

887 

78  67  69 

29.783 

922 

85  00  84 

30.364 

853 

72  76  09 

29.206 

888 

78  85  44 

29.799 

923 

85  19  29 

30.381 

854 

72  93  16 

29.223 

889 

79  03  21 

29.816 

924 

85  37  76 

30.397 

855 

73  10  25 

29.240 

890 

79  21  00 

29.833 

925 

85  56  25 

30.414 

856 

73  27  36 

29.257 

891 

79  38  81 

29.850 

926 

85  74  76 

30.430 

857 

73  44  49 

29.275 

892 

79  56  64 

29.866 

927 

85  93  29 

30.447 

858 

73  61  64 

29.292 

893 

79  74  49 

29.883 

928 

86  11  84 

30.463 

859 

73  78  81 

29.309 

894 

79  92  36 

29.900 

929 

86  30  41 

30.480 

860 

73  96  00 

29.326 

885 

80  10  25 

29.916 

930 

86  49  00 

30.496 

861 

74  13  21 

29.343 

896 

80  28  16 

29.933 

931 

86  67  61 

30.512 

862 

74  30  44 

29.360 

897 

80  46  09 

29.950 

932 

86  86  24 

30.529 

863 

74  47  69 

29.377 

898 

80  64  04 

29.967 

933 

87  04  89 

30.545 

864 

74  64  96 

29.394 

899 

80  82  01 

29.983 

934 

87  23  56 

30.561 

865 

74  82  25 

29.411 

900 

81  00  00 

30.000 

935 

87  42  25 

30.578 

866 

74  99  56 

29.428 

901 

81  18  01 

30.017 

936 

87  60  96 

30.594 

867 

75  16  89 

29.445 

902 

81  36  04 

30.033 

937 

87  79  69 

30.610 

868 

75  34  24 

29.462 

903 

81  54  09 

30.050 

938 

87  98  44 

30.627 

869 

75  51  61 

29.479 

904 

81  72  16 

30.067 

939 

88  17  21 

30.643 

870 

75  69  00 

29.496 

905 

81  90  25 

30.083 

940 

88  36  00 

30.659 

871 

75  86  41 

29.513 

906 

82  08  35 

30.100 

941 

88  54  81 

30.676 

872 

76  03  84 

29.530 

S07 

82  26  49 

30.116 

942 

88  73  64 

30.692 

873 

76  21  29 

29.547 

SOS 

82  44  64 

30.133 

943 

88  92  49 

30.708 

874 

76  38  76 

29.563 

909 

82  62  81 

30.150 

944 

89  11  36 

30.725 

875 

76  56  25 

29.580 

910 

82  81  00 

30.166 

945 

89  30  25 

30.741 
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Squares  and  Square  Roots — Continued 


No. 

Square 

Square 
Root 

No. 

Square 

Square 
Root 

946 

89 

49 

16 

30.757 

976 

95 

25 

76 

31.241 

947 

89 

68 

09 

30.773 

977 

95 

45 

29 

31.257 

948 

89 

87 

04 

30.790 

978 

95 

64 

84 

31.273 

949 

90 

06 

01 

30.806 

979 

95 

84 

41 

31.289 

950 

90 

25 

00 

30.822 

980 

96 

04 

00 

31.305 

951 

90 

44 

01 

30.838 

981 

96 

23 

61 

31.321 

952 

90 

63 

04 

30.854 

982 

96 

43 

24 

31.337 

953 

90 

82 

09 

30.871 

983 

96 

62 

89 

31.353 

954 

91 

01 

16 

30.887 

984 

96 

82 

56 

31.369 

965 

91 

20 

25 

30.903 

985 

97 

02 

25 

31.385 

956 

91 

39 

36 

30.919 

986 

97 

21 

96 

31.401 

957 

91 

58 

49 

30.935 

987 

97 

41 

69 

31.417 

958 

91 

77 

64 

30.952 

988 

97 

61 

44 

31.432 

959 

91 

96 

81 

30.968 

989 

97 

81 

21 

31.448 

960 

92 

16 

00 

30.984 

990 

98 

01 

00 

31.464 

961 

92 

35 

21 

31.000 

991 

98 

20 

81 

31.480 

962 

92 

54 

44 

31.016 

992 

98 

40 

64 

31.496 

963 

92 

73 

69 

31.032 

993 

98 

60 

49 

31.512 

964 

92 

92 

96 

31.048 

994 

98 

80 

36 

31.528 

966 

93 

12 

25 

31.064 

995 

99 

00 

25 

31.544 

966 

93 

31 

56 

31.081 

996 

99 

20 

16 

31 . 559 

967 

93 

50 

89 

31.097 

997 

99 

40 

09 

31.575 

968 

93 

70 

24 

31.113 

998 

99 

60 

04 

31.591 

969 

93 

89 

61 

31.129 

999 

99 

80 

01 

31.607 

970 

94 

09 

00 

31.145 

1000 

100 

00 

000 

31.623 

971 

94 

28 

41 

31.161 

972 

94 

47 

84 

31.177 

973 

94 

67 

29 

31 . 193 

974 

94 

86 

76 

31.209 

975 

95 

06 

25 

31.225 

INDEX 


Accomplishment  quotient  (A.Q.), 
193. 

Accuracy,  standard  of,  270. 

Aims  of  education,  statement  of 
and  adherence  to,  10;  Bob- 
bitt's  statement  of,  10;  gen- 
eral, 11;  value  of  general 
limited,  11;  must  be  definite, 
13. 

American  people,  a  practical  folk, 
6;  have  faith  in  schools,  52; 
support  schools  lavishly,  52. 

Arithmetic  mean,  279;  weighted, 
281;  computation  of  by  short 
method,  283. 

"  At  age  "  and  normal  child,  90. 

Averages,  279. 

Ayres,  criticizes  Binet  tests,  84-85, 
93-94;  handwriting  scale,  181; 
spelling  scale,  255;  measure- 
ment retardation,  acceleration, 
and  elimination,  254-257;  com- 
putation coefficient  of  correla- 
tion, 337-344. 


BibUography.  See  end  of  each 
chapter. 

Binet,  conception  of  intelligence, 
64 ;  age-grade  method  of  measu- 
ring intelhgence,  75;  diflerence 
between  individuals  one  of  kind, 
82;  his  later  conception,  83. 

Binet-Simon  tests,  67-68;  synopsis 
of,  68-70;  innovations,  72; 
brought  constituent  functions  of 
intelligence  into  play,  72;  kinds 
of  mental  functions  brought  into 
play,  74;  provisional  scale, 
1905,  75 ;  problems  to  be  solved, 
75-76;  A>Tes  criticizes,  84-85; 
age  at  which  tests  should  be 
assigned,    86-87;    problems   in 


scoring,  87-88;  all  -  or  -  none 
method,  88;  point  scale  for 
measuring,  88;  number  of  tests 
to  be  passed  each  age- level,  88; 
with  what  tests  shall  examina- 
tion begin,  89;  give  more  than 
composite  picture,  90;  criti- 
cisms of,  93;  do  not  determine 
limits  of  trait,  94-95 ;  some  fav- 
orable criticism  of,  99;  revisions 
and  e.xtension  of,  104-122; 
Stanford  revision,  104-107. 

Black,  W.  W.,  36. 

Bobbitt,  J.  F.,  10. 

Bowley,  on  correlation,  325. 

Bridges,  point  scale,  111-117. 

Burgess,  measurement  silent  read- 
ing, 172. 

Bureau  of  Standards,  Washing- 
ton, D.  C,  5,  40. 

Burt,  conception  of  general  intel- 
hgence, 65;  measurement  of 
backward  children,  113. 

Business,  compared  with  schools, 
42;  variables  in,  42. 


Cattell,  establishes  psychological 
laboratory,  60. 

Central  tendency,  measurement 
of,  Ch.  X,  278-303. 

Class  intervals,  274. 

Classification  of  pupils  by  edu- 
cational tests,  190-192. 

Confidence  of  pubhc,  must  cul- 
tivate, 49. 

Correlation,  measurement  of,  Ch. 
XII,  324-360;  need  for,  324; 
illustrating  computation  of,  data 
ungrouped,  328;  data  grouped 
and  complex,  331-337;  perfect 
positive,  331;  computing  by 
Ayres'  short  method,  337-344". 
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graphic  method  of  representa- 
tion, 344-350;  reliability  of, 
360. 

Creeds,  educational,  12-13;  sci- 
entist and,  13. 

Culture,  emphasizes  things  of 
mind,  25;    development  of,  26. 

Curve  plotting,  273. 


Danger  to  be  avoided,  12. 

Data,  on  school-room  problems 
lacking,  25;  rules  for  tabulating, 
272. 

Davenport,  on  correlation,  325. 

Dearborn  group  intelligence  tests, 
132. 

Defective  child,  compared  with 
normal,  100-102. 

De  Sanctis  tests  for  mental  defec- 
tion, 121. 

Deviation,  computation  of  mean, 
308;  with  grouped  data,  310; 
general  formula  for  computing, 
313;  standard  computation  of, 
315;  mean  and  standard  com- 
pared, 317-321. 

Diagnostic  tests.     See  Tests. 

Distribution,  of  subnormal  indi- 
viduals, 78;  table  of,  275; 
analysis  of,  275. 

Downe}',  on  will  profile,  142. 


Ebbinghaus'  conception  of  intel- 
ligence, 64. 

Economy,  affected  through  small 
savings,  16;  applied  to  indi- 
viduals, 18;  in  education,  18- 
19;  through  measurements,  Ch. 
II,  10-55. 

Educative  situation,  8. 

Educational  age,  192;  compared 
to  mental  age,  192. 

Educational  measurements,  has 
scoffers  and  zealots,  43;  can- 
not plot  curve  of  genius,  43. 

Educational  quotient  (E.Q.),  192, 
194. 

Efficiency,  demand  for,  1;  in  the 
Gary  schools,  2;  teachers',  2; 
books  on,  2-3;  how  mechanic 
determines,  3;  of  school  system, 


3;  American  citizen,  3-4;  meas- 
ured in  higher  education  6-7; 
working  hypothesis  for  acquir- 
ing, 8. 

Energy  must  be  conserved,  15. 

Equation  of  straight  line  of  re- 
gression, 351. 

Error,  question  of,  262;  compen- 
sating vs.  cumulative,  270. 

Examinations,  time  consumed  in 
giving,  147 ;  attitude  of  teachers 
and  pupils  toward,  148. 

Experience,  profiting  by  that  of 
others,  39;  in  business,  39-40; 
in  medicine,  40;  in  manufac- 
turing stoves,  41. 


Facts,  inability  to  show  works 
hardships,  32;  business  man 
and,  33;  "show  up"  school 
by,  34. 

Factual  basis  for  education,  20. 

Fechner,  57. 

Fields  of  educational  tests  and 
measurements,  53-54. 

Form  board  for  measuring  intelh- 
gence,  108. 

Freeman  handwriting  scale,  for- 
mation of,  182. 


Genius,  differs  from  intelligence, 
63. 

Goddard,  100. 

Gray,  handwriting  score  card, 
182-184. 

Gregory,  study  of  reading  vocab- 
ularies, 237;  scoring  American 
histories,  240-247. 

Gregory-Spencer  geography  tests, 
214;  divisions  of,  216;  selec- 
tion of  cities  in,  217;  advan- 
tages of  tests  thus[designed,  221 ; 
determination  of  scores  in,  222; 
objections,  224;  effects  of  in- 
correct statements,  225. 

Group  intelligence  tests,  122-136; 
principles  involved,  123;  re- 
quirements for,  124;  Terman, 
125;  National,  127;  Haggerty, 
129;  Otis,  131;  Dearborn,  132; 
number  given,  135. 
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Habermann,  definition  of  normal 
individual,  78. 

Haggerty,  intelligence  examina- 
tions, 129;  summarizes  work  of 
measurements,  140;  on  factors 
which  condition  success,  140. 

Hall,  estabhshes  psychological 
laboratory,  60. 

Hardwick,  point  scale,  111-117. 

Healy,  picture  completion  test, 
108;  criticizes  mental  testing, 
143. 

Human  energj'  must  be  conserved, 
15. 

Hume's  description  of  meta- 
physical sciences,  20-21. 


Inductive  method,  not  used  in 
pedagogy,  24. 

Intelligence :  measurement  of, 
Chs.  III-IV,  56-144;  some 
major  pi'oblems  of,  56;  what 
it  is,  63-65;  definitions  of,  63; 
differs  from  memory  and  genius, 
63-67;  Zeihen  attempts  to 
measure,  65;  as  general  faculty 
of  the  mind,  65;  Burt's  investi- 
gations of,  65;  general  faculty 
of,  65-67;  inability  to  define 
does  not  prohibit  measure- 
ments, 67;  maturity  of,  70; 
depends  on  correlation  and  in- 
terfunctioning,  73;  differences 
in  degree  and  kind,  82;  choosing 
tests  to  measure,  S3;  tests  must 
not  be  influenced  by,  84;  tests 
must  have  symptomatic  value, 
85;  tests  must  not  depend  on 
use  of  language,  85-86;  group 
tests  of,  122-136;  levels  for 
various  occupations,  132;  sum- 
mary and  evaluation,  136-145; 
methods  crude,  137;  sympo- 
sium on,  138;  suggestions  to 
teachers,  144. 

Intelligence  quotient  (I.Q.),  81, 
192. 


Jaedorholm,  experiments  on  sub- 
normal children,  83. 
James,  38. 


Jolinson,  on  grading  high-school 
students,  163. 

Jones,  studies  on  spelling  vocab- 
ularies, 236. 


Kant,  on  psychology  as  a  science, 

59. 
Kelly,  F.  J.,  on  teachers'  marks, 

164. 
Keeping  records  of  methods  tried, 

49. 
KnoUin,  tests  "  hoboes,"  132. 
Knowledge,  attribute  of  man  of 

science,  9. 
Kuhlmann,  on  mental  maturity, 

92. 


Limited  quantities  make  measure- 
ments necessary,  14. 
Lowell,  on  simplicity  of  tests,  124. 


Marking  system,  now  in  vogue, 
150;  inadequacy  of,  156;  in- 
efficient, 158. 

McCall,  location  of  zero  point, 
199;   definition  of  median,  289. 

Mean,  279;  computation  of  mean 
deviation,  308. 

Measurements,  in  physical  sci- 
ences, 5 ;  teachers  should  use,  9 ; 
suggestions  to  teachers  for,  144; 
of  school  achievements  new, 
152;  more  exact  make  educa- 
tion a  science,  164;  by  opinion, 
comparison,  and  standardized 
tests,  189;  in  other  fields,  Ch. 
VIII,  232-258;  of  materials  of 
instruction  233-249;  of  spelling 
vocabulary,  233;  of  physical 
growth  of  children,  249-251;  of 
school  buildings,  251-253;  of 
retardation,  acceleration,  and 
elimination,  253-258 ;  educa- 
tional compared  with  other 
fields,  264;  of  central  tendency, 
Ch.  X,  273-303;  of  variabihty, 
304-323. 

Measures,  distribution  of  about 
central  tendency,  262;  undis- 
tributed, 271. 
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Median,  287;  definitions  of,  287- 
289;  formula  for  finding,  290, 
computation  of  in  simple  dis- 
tribution, 291;  in  complex  dis- 
trbution,  293;  compared  with 
mean  and  mode,  302. 

Medical  criterion  for  separating 
subnormal  from  normal,  79-80. 

Mental  age,  coefficient  of,  90-91; 
how  to  find,  106. 

Mental  maturity,  age  of,  91-93; 
Terman's  view  of,  92. 

Meumann,  conception  of  intelli- 
gence, 64;  criticizes  idea  of 
general  factor  in  inteUigence, 
66;  proposed  reorganization  of 
Binet  scale,  117. 

Mode,  286,  302. 


National  inteUigence  tests,  127. 
Need  for  definite  measurements, 

Ch.  V,  147-179. 
Normal  probabUity  curve,  77. 
Normal     frequency     curve.     See 

Normal  probabihty  curve. 
Normal  individual,  Habermann's 

definition   of,    78;    criteria   for 

separating  from  subnormal,  79. 
Norsworthy,       experiments       on 

feeble-minded  children,  83. 


Objections  to  educational  tests, 
43-iS. 

Obstacles  to  rational  educational 
reform,  28. 

Opinions:  kinds  and  uses,  21; 
when  worthless,  22 ;  philosophi- 
cal vs.  knowable  facts,  24-25; 
when  no  excuse  for,  26. 

Otis  group  inteUigence  tests,  131. 


Paterson,  scale  of  performance 
tests,  109-110. 

Pearson,  experiments  on  subnor- 
mal children,  83;  on  correla- 
tion, 325-326;  equation  for 
regression  lines,  353. 

Pedagogical  criterion  for  sepa- 
rating subnormal  from  normal, 
79. 


Perfonnance  tests,  86. 

Percentiles,  303. 

Picture  completion  tests,  107-108. 

Pintner,  scale  of  performance 
tests,  109-110. 

Point  scale  for  measuring  mental 
abUity,  111-117;  distribution  of 
tests,  114-115. 

Porteus,  on  mental  maturity,  92; 
on  Umitations  of  Binet  tests,  97. 

Pragmatism,  6. 

Probable  error  (P.E.),  306. 

Progress  conditioned  by  abiUty 
to  measure,  4. 

Psychiatrist,  work  of,  61-63. 

Psychological  criteria  for  sepa- 
rating subnormal  from  normal, 
79-80. 

Psychological  measurements  re- 
tarded, 59;  made  slow  progress 
from  Aristotle,  61. 

Public  asking  for  a  ledger  account, 
31;  interested  in  education,  51. 

Pyle,  criticizes  Binet  tests,  97. 


Quadrants,  349. 

Quantities    measured    indirectly, 

266. 
Quartile  deviation,  306. 
QuartUes  and  percentUes,  303. 


Regression  lines,  equation  for, 
351;  Pearson's  equation  for, 
353. 

Rice,  J.  M.,  28. 

Rossolimo,  mental  measurements, 
120. 

Rugg,  definition  of  median,  287. 


Salt  Lake  City  survey,  30. 

Scale  of  performance  tests,  109. 

Scale,  definition  of,  159;  ideal 
must  have,  161;  has  been  sub- 
jective, 167;  some  characteris- 
tics of  ideal,  196;  zero  point  of, 
196;  making  steps  equal,  202- 
211;  must  measure  desired 
product,  211. 

Scores,  value  to  be  assigned,  226; 
weighting,    228;    accumulation 
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and  difficulty,  229;  interval, 
289. 

Scoring,  tests  and  treatment  of 
measures,  Ch.  VII,  227;  teach- 
ers' judgment  in,  228;  Ameri- 
can histories,  240-249. 

School,  purpose  of,  18;  is  a  state 
monopoly,  32;  compared  to 
business,"  42 ;   variables  in,  42. 

Seashore  criticizes  Binet  tests, 
96-97. 

Secrist,  definition  of  median,  288. 

Series,  discrete  and  continuous, 
271. 

Single  variable,  law  of,  172;  illus- 
tration of,  172-174. 

Social  economic  criteria  for  mental 
deficiency,  78-79. 

Spearman,  conception  of  general 
intelligence,  65-66;  mental  ma- 
turity, 92;  method  of  rank 
correlation,  362. 

Specifications,  manufacturer  has, 
41;   schools  should  have,  42. 

Spelling  vocabulary,  measurement 
of,  233;  of  Ayres'  scale,  234. 

Standards,  estabUshment  of,  34- 
39;  bricklayer  has,  36;  three 
kinds  needed,  36;  quantity,  36; 
time,  38;  quality,  39;  changing, 
39. 

Stanford  Revision  Binet  Scale, 
104-107. 

State  examination  questions  ex- 
amined, 216. 

Statistics,  general  statement  of, 
Ch.  IX,  260-277;  uses  in  other 
fields,  261;  definition  of,  267; 
laws  of  statistical  regularity, 
268;  methods  of,  269;  limita- 
tions of,  270;  imderstanding 
statistical  formulas,  276. 

Stem,  definition  of  intelligence, 
63;  coefficient  of  mental  age,  90. 

Strayer  and  Engelhardt,  score 
card,  2,51-253. 

Superintendents'  meeting  at  Indi- 
anapolis, 28. 

Supervision,  improved  by  meas- 
urements, 165. 


T  scale,  199-202. 


Talent,  differs  from  intelligence, 
63. 

Teachers,  influence  measured,  7; 
must  know  why  changes  are 
made,  49-50;  must  take  initia- 
tive, 153. 

Technical  scientist  and  educa- 
tional creeds,  12. 

Terman,  use  of  intelUgence  quo- 
tient, 91;  on  maturity  of  intel- 
Ugence, 92;  revision  of  Binet 
scale,  104-107;  group  tests, 
125 ;  symposium  on  intelligence, 
138;  T  scale,  201. 

Tests,  educational  curiosities,  9; 
limitations  of,  91;  use  of  intel- 
ligence, 132;  army  mental,  132; 
purposes  of  not  understood,  154; 
what  they  measure,  155;  do 
not  indicate  cause  of  conditions, 
168;  how  differ  from  other  ex- 
aminations, 169;  how  made, 
169-172;  when  should  be  given, 
174;  number  of  times,  176; 
how  improve  instruction,  177; 
kinds  most  important,  178; 
classifying  and  designing,  180- 
185;  diagnostic  vs.  general,  180; 
degree  to  which  diagnostic,  181 ; 
used  in  Cleveland  survey,  185; 
formal  and  reasoning,  185;  rate 
and  development,  186;  quan- 
tity, difficulty,  and  time,  187; 
subject-matter  for,  194;  deter- 
mined by  information  desired, 
196;  simple  in  appHcation,  211; 
not  require  too  much  time,  212; 
scoring  and  treatment  of  meas- 
ures, Ch.  VII,  214-230;  pr9b- 
lems  in  scoring,  214;  making 
a  geogi'aphy  test,  214;  values 
to  be  assigned  to  scores,  226. 

Testing,  problems  of,  100. 

Thinking,  quantitatively,  5-6. 

Thomdike,  48 ;  on  general  faculty 
of  intelligence,  66;  established 
zero  in  handwriting,  198;  defi- 
nition of  median,  288. 

Time,  relation  to  school  products, 
28. 

Titchener,  58;  establishes  psycho- 
logical laboratory,  60. 

Tradition,  strength  of,  9. 
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Variability,  measurement  of,  Ch. 

XI,   304-323;    how  measured, 

305;  measures  of  absolute,  305; 

coefficient    of,     321;      Pearson 

formula    for,    322;     Thorndike 

formula  for,  323. 
Vocabularies,    233 ;     reading  and 

spelling  of  Oregon  school  chil- 

djen,  337. 


Wallin,  criticizes  Binet  scale,  118. 

Waste,  teachers  should  eliminate, 
9;  elimination  of,  14;  not  pecu- 
liar to  American  schools,  14; 
opportunities  for  in  education, 
15;  how  business  man  elimi- 
nates, 16;  much  and  varied  in 
education,  17. 


V/att,  4. 

Weber,  58;  experimental  work  in 
psychology,  58-59;   laws,  58. 

Woody,  arithmetic  scale,  195. 

Wooters,  21. 

Wundt,  57;  establishes  psycho- 
logical laboratory,  57-58;  ef- 
fects of  laboratories,  60. 


Yerkes,  point  scale,  111-117. 


Zeilen,  attempts  to  measure  in- 
telligence, 65. 

Zero,  point  of  scale,  161,  196;  in 
composition,  197;  in  penman- 
ship, 198;  McCall  establishes, 
199;  in  T  scale,  200. 
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